HUMAN 


arecord of research 
MAY, 1949 


VOL. 21 No. 2 


DYADIC ANOVA, AN ANALYSIS OF VARIANCE FOR 
VECTORS * 


BY JOHN W. TUKEY 
Princeton University 


I. INTRODUCTION 


ERHAPS more than any other statistical technique, the analysis 

of variance is “all things to all people.” This paper discusses 

some of its applications to multiple variates, using vector ideas whenever 

appropriate. It takes its start, therefore, from the present state of the 

analysis of variance (or, as we shall say, of anova) for single variates. 

It will be worthwhile to discuss this for a little, in order to explain 

why the general form of some techniques has been discussed, while that 
of others has not. 

In its original applications to the elementary and difficult agro- 

nomical problems, “ Which variety of (fertilizer) is best? Am I sure 


of this?” it had the two underlying purposes clearly stated and 
reiterated by R. A. Fisher: 


(1) the broadening of the logical basis of the experiment, and 


(2) the reduction to impotence of certain large sources of error, 


* Prepared in connection with research sponsored by the Office of Naval 
Research. 


| 

7 

q 

| 

q 

| 

q 

q 


66 JOHN W. TUKEY 


such as litter differences in animal experimentation and fertility 
gradients in plant experiments. 


In this connection, it is customary to ask, in a simple row by column 
design where at least one classification is a treatment under study, the 
following questions: 


(1) Is there evidence for a systematic effect between rows? 
(2) Is there evidence for a systematic effect between columns? 


(3) Is there evidence that these two specific rows differ “on the 
average ”? 


(4) Is there evidence that these two specific columns differ “on the 
average ” ? 

(5) How strong is the evidence that this specified row “on the 
average” gives a larger value than any other? 


(6) How strong is the evidence that this specified column “on the 
average ” gives a larger value than any other? 


It is also the practice to answer these questions. But the validity of 
the answers so far as our present knowledge goes is of very different 
sorts. 

Given simple hypotheses of additivity, lack of correlation, homo- | 
scedasticity, and normality it is easy to makes answers to (1) and (2) 
which have known behavior. It is even possible to get some idea of the 
effect of violating the hypothesis of normality (which proves to be 
small). In the present state of statistical theory and of the corre- 
spondence with practice, the technique of the F-test, as it is used to 
answer questions (1) and (2), is well understood and-well-behaved in 
comparison with most tests. This is not the case with the technique 
of assigning “standard errors” to the apparent effects of rows and 
columns. There is, so far as I know, no completely self-consistent 
hypothesis which gives rise to the usual techniques. This lack corre- 
sponds to an error of formal judgment which is probably small in 
answering questions (3) and (4), and frequently large in answering 
questions (5) and (6). It would take us too far afield to discuss these 
questions further, but what we have said will indicate why we shall say 
no more about questions (3), (4), (5) and (6), or about their analogs 
for systems with many variates. 

In addition to answering questions (1) and (2), there are consistent 


DYADIC ANALYSIS OF VARIANCE 67 


hypotheses under which the analysis of variance leads to estimates of 
the so-called components of variance, and under which the properties 
of these estimates have known properties. It is with the problem of 


(i) estimating effect variances (— components of variance), 
(ii) testing significance, and 
(iii) choosing terms in which ordinary anova is most powerful 


that we shall be principally concerned. 

There is no reason to suppose, however, that there will be any 
difficulty in extending to many variates whatever satisfactory solutions 
may in time be found for the single variate case of questions (3), (4), 
(5) and (6). 

The present paper is a revision of part of a paper presented to the 
Institute of Mathematical Statistics in Princeton in November 1946, 
under the title “ Vector Methods in the Analysis of Variance.” The 
other two main parts of that paper dealt, one with the analysis of 
covariance in such a way as to contrast it with dyadic anova, and the 
other with the analysis of plurivariance. Since various accounts of 
the analysis of covariance are available, and since the analysis of pluri- 
variance is less frequently useful, it seemed fitting to publish the 
present section at this time. The discussion of the expectation formulas 
for sums of squares is much more general than in the original version. 

Other accounts of analysis of variance procedures for vectors have 
been published [1, Bartlett, 1947], but, in their concentration on tests 
of significance, they seem to the writer to have lost much that in ordinary 
anova is most useful and revealing. The present account is intended 
to provide a simple and direct extension of the ordinary case, in which 
we can proceed as usual—and get answers to the usual questions. 


2. Summary 


The application of vector notions and methods to forces, displace- 
ments, etc., in three-dimensional space, is not unfamiliar to many, and it 
is perhaps natural to assume that this is a fair model for all vector 
situations; this is not so, for the rigid rotations which preserve all 
relevant facts do not exist in the general case. There is no logical 
difficulty, and often many advantages, in saying 


(3 oranges, 2 apples) + (1 orange, 4 apples) — (4 oranges, 6 apples) 


| 
| 
| 
| 
if 
q 
q 
q 
4 
if 
- 
q 
j 
y 
> 
y | 
| 


68 JOHN W. TUKEY 


It is a maxim of arithmetic that it is not proper to add 2 oranges to 
1 apple; this is good arithmetic but may be poor vector algebra. For, 


(2 oranges, 0) + (0,1 apple) — (2 oranges, 1 apple), 


is a meaningful and useful statement. It is also perfectly reasonable to 
change coordinates in such situations so that, for example, (2 oranges, 
1 apple) has coordinates (— 1,3) in terms of $(apples — oranges), and 
4 (apples + oranges). 

If we are to have an analysis of variance, we must have squares, 
and the solution is 
4 orange” 2(orange) (apple) 


2 
(2 oranges, 1 apple) 2(orange) (apple) 1 apple? 


It turns out that this yields a satisfactory analysis of variance with 
useful properties. 

In Chapter II, this process is discussed at some length, and it is 
shown that estimating effect variances is the same as for a single variate. 


Chapter III considers tests of significance, and it develops that the 
theory exists for analogs of the usual tests, but that the necessary 
tables have not been computed. 


Chapter IV considers fiducial inference and finds that the necessary 
theory is not available. 


Chapter V applies dyadic anova to the problem of the choice of 
terms. For a given type of experiment on a given plant, should the 


analysis be made in bushels per acre? or in Vbushels per acre? or in 
log bushels per acre? Questions like this arise in every field of experi- 
ment; and when conditions are right and the data is extensive enough, 
this application of dyadic anova may help to answer the question. 

The Appendix presents mathematical details which are not essential 
to the application of the methods discussed. 


3. Basic preliminaries 

The term variate will be used in the sense in which random variable 
is often used, as something which takes on values according to a prob- 
ability distribution,—in practice it is identified with a concept such as 
“the velocity of light measured by apparatus 14.” The values of 
different variates may be numbers, vectors, etc., and if we wish to dis- 
tinguish these cases we shall speak of single variates, of vector variates, 


DYADIC ANALYSIS OF VARIANCE 69 


etc. We shall not use a special notation to distinguish between a variate 
and its values, but shall depend on the context to make clear which is 
meant. Whenever two variates, or the values of two variates appear in 
the same equation, we shall tacitly assume that they have a joint prob- 
ability distribution. Equations between variates mean, by definition, 
equations between each and every set of possible values. 

Since sums of vectors and multiples of vectors are vectors, each 
vector variate, say x, has an expected value (mean value, average value), 
E(x). A vector variate has a normal distribution if the joint distribu- 
tion of the components of the vector in one coordinate system is normal. 
Normality in one coordinate system implies normality in all coordinate 
systems. Notice that nothing has been said about nonsingularity, and a 
vector (y,z), with y=1 and z normally distributed about —2 with 
variance 3, is a normally distributed vector variate. 

It is a pleasure to thank T. E. Harris for helpful comments and 
suggestions. 


II. ESTIMATING EFFECT VARIANCES 
4. The analysis of the two-way table 


Consider the elementary analysis of variance situation. Observations, 
which we shall denote by z, using appropriate subscripts when necessary, 
are available in re cells arranged in r rows and c¢ columns, and the 
individual entries are supposed to arise from general, row, column, and 
cell effects, where it is further assumed that 


(4.1) the effects are exactly additive, 
(4.2) individual effects of different kinds are uncorrelated, 


(4.3) individual effects of the same kind have a covariance charac- 
teristic of the kind (which may be zero), 


(4.4) the expectation of each cell effect is the same. 


Nothing has been assumed about 
(i) equality of variances of effects of the same kind, 
(ii) normality of general, row, column, or cell effects, 
_ (iii) the vanishing of certain averages of effects, 
(iv) independence of individual effects of the same kind, 


since these additional conditions, frequently assumed, are entirely un- 


q 
q 
q 
| 
2 
| 
| 
| 


70 JOHN W. TUKEY 


necessary for the simple results which we are now going to state. 
(Some of them will be needed when we come to discuss tests of signifi- 
cance, fiducial and confidence intervals, etc.) 

It is important to notice that our conditions cover the two best- 
known special cases [2, Eisenhart 1947] where (Model I) row and 
column effects are constant while cell effects are independently and 
normetly distributed with mean zero, or (Model II) row, column and 
cell effects are all independently and normally distributed. 

To each kind of effect there corresponds a corresponding effect 
variance or component of variance. This is defined as the average value, 
over all tables, of the mean squared deviation of the effects of that sort 
found in a given table. (Since the individual effects cannot be observed, 
this mean squared deviation cannot be observed directly either!). 

The usual procedure of the analysis of variance divides the sum of 
the squares of all the observations into an exact sum of four terms, 


(1) re times the square of the grand mean (= SSMz), 

(2) the row sum of squares (= SSRr), 

(3) the column sum of squares (— SSCz), 

(4) the balance (interaction, discrepance, error, row-by-column, or 
cell) sum of squares (— SSBr). 


Simple algebra, carried out in the Appendix, shows that the last three 
sums of squares have expectations which are simple linear combinations 
of the effect variances. Denoting the row, column, and cell effect 
variances by o,*, o,”, and o»”, these linear combinations are 


(r—1)co,? + (r—1)o,? E(SSRz), 
(¢ —1)ro.? + (¢ —1) oy? = E(SSCz), 
(r—1)(c—1)o,)? = E(SSBz). 
Here the coefficients of o,? are the degrees of freedom, r —1, c —1, and 
(r —1)(c—1), and the ratios of sums of squares to degrees of freedom 
give the mean squares. By subtracting mean squares and dividing by 
the appropriate constant, we obtain quantities whose expectations are 
o,” and o»”; these are known as the components of mean square. Thus 
for r= 2, c = 3, the analysis of the table of numbers 
1 3 2 
5 3 2 6 


DYADIC ANALYSIS OF VARIANCE vel 


yields the anova table 


DF 88S MS CMS 


Mean ...... 1 8(3.25— M)? 
Rows ...... 1 4.5 4.50 0.58* 
Columns ... 3 9.5 3.16* 0.50 
Balance .... 3 6.57 2.16* 2.16* 
where 
0.50 3.16* 0.58* 4:50 2.16* i 


We have used, as we shall from this point on, the abbreviations: 
DF = degrees of freedom, SS = sum of squares, MS — mean square, 
CMS = component of mean square. We write, for example, SSBz for 
6.5, MSCzx for 3.16* and CMSRre for 0.58*. If (4.1) to (4.4) hold, 
then 0.58* is a mean estimate of o,”, 0.50 of o,”, 2.16* of o»?. The entry 
for the sums of squares for the mean is 8 (—rc) times the square of 
the difference between the observed mean 3.25 and the assumed or 
contemplated mean M. (More assumptions are required before a CMS 
for the mean can be reasonably calculated. ) 


5. Extension to vectors 


In Sections A and B of the Appendix, the usual expectation formulas 
have been developed by algebra. Now, what kind of algebra did we 
need? A detailed check will show that we used just these things: 


(i) the z’s could be multiplied by real numbers and the results 
added together freely with the usual rules of algebra; 


(ii) z,2, is defined, can be multiplied by real numbers and the 
results added together freely with the usual rules of algebra; 


(iii) (2, 2) 25 = T2F3, = 773 
Nowhere did we need or use the otherwise frequently useful relation 


This means that there was no need for the x’s to be numbers, and we 
shall investigate and make good use of cases where they are not. 


q 
a 
a 
: 
4 
4 
| 
| f 


72 JOHN W. TUKEY 


In his Vector Analysis, J. Willard Gibbs [5] worked with three- 
dimensional vectors in many ways. One of the things he did was to 
multiply them together and get, not a number (as he would from a dot 
or scalar product), nor a vector (as he would from a cross or vector 
product), but something new. These things he called dyads, and their 
sums he called dyadics. We need to know only three things about them, 
namely : 

(i) geometrically, they are as completely independent of a coordi- 
nate system as are vectors; 


(ii) in a given coordinate system, they are represented by square 
arrays of components, and 


ad ae af 
(a,b, c)(d,e,f) = | bd be df 
cd ce cf 
so that 
10 —40 20 
(10, 12,3)(1,—4,2) | 12 —48 24] ; and 
3 —12 6 


(iii) they are added componentwise : 


4 9 2 8 1 6 12 10 8 
3 5 +43 5 =— 6 10 14 
8 1 6 4 9 2 12 10 8 


The immediate conclusion is, that if z is a three-dimensional vector, 
we may run an analysis of variance on it, and the sums of squares and 
mean squares will be dyadics with 9 components. 

This is, of course, not the most general case, for in 1938 Hassler 
Whitney [11] showed that, under conditions implied by (i) of Section 4, 
there is always a natural way to define 2,7, as what he termed the 
“tensor product,” when the 2; belong to an arbitrary Abelian group. 
Thus there is an “ analysis of variance ” for observations with values in 
any Abelian group. 

We shall not try to be especially general, but we have no reason 
to prefer the use of three-dimensional vectors to that of six, two, or 
five-dimensional ones, where 


DYADIC ANALYSIS OF VARIANCE 13 


and 
4 4 6 10 —4 —12) 
6 9 12 —6 —18 
(2, 3,5, — 2, —6)? = 10 15 25 —-10 —30]. 
—4 —6 —10 4 12 
\—12 —18 —30 12 36 ) 


6. Other models 


We emphasized that (4.1) to (4.4) included Models I and II of 
Fisenhart [2, 1947], but these are not general enough to meet the 
practical needs of the situation. In practice, randomization, and 
sampling from finite populations are the order of the day. We want 
models to cover these situations, so we introduce the following: 


Move III (Pure randomization). There are re fired values which 
are randomly arranged as the cell effects. There are ec fixed values 
which are randomly arranged as the column effects. There are r fixed 
values which are randomly arranged as the row effects. These random 
arrangements are made independently. 


Models of this general type have been considered by Welch [10] 
and by Pitman [8]. 


Move. IV (Finite populations). The re cell effects are a randomly 
arranged sample of re from a finite population of size N. The c column 
effects are a randomly arranged sample of c from a second finite 
population of size Ne. The r row effects are a randomly arranged 
sample from a third finite population of size N,. These random 
samples are drawn and arranged independently. 


Clearly Model III is the special case of Model IV when N —re, 
N. C, N, 


Mover V. Each cell effect is an independent sample of one from a 
separate population with mean zero and variance ojj*?. Each column 
effect is an independent sample of one from a separate population with 
mean zero and variance cio”. Each row effect is an independent sample 
of one from a separate population of mean zero and variance ao;’. 
These populations are infinite. The selections are collectively at random. 


Clearly the cell effects here are a reasonable facsimile of “ experi- 
mental error” in the narrowest sense. Further, Model V includes 
Model II. 


i 
) 
| 
| q 
if 


74 JOHN W. TUKEY 


We shall shortly show that these three models are included in the 
scope of our hypotheses and hence of our expectation formulas. Since 
compound effects formed by adding up independent parts satisfying 
(4.1) to (4.4) are easily shown to satisfy (4.1) to (4.4) themselves, 
it follows that our hypotheses and expectation formulas include 


Mopet X. Lach cell, row, or column effect is the sum of three 
independent subeffects. The first set of subeffects satisfies Model III, 
the second set of subeffects satisfies Model IV. The third set of sub- 
effects satisfies Model V. The sets of subeffects are collectively at random. 


We have now to verify that (4.1) to (4.4) are in fact satisfied, 
and to calculate o*, o,? and o,*? in terms of the populations we have 
introduced. Reference to Section C of the Appendix shows that for 
Model V, which clearly satisfies (4.1) to (4. 4) 


o,” = average variance of column effects, 


op” = average variance of cell effects, 
(Model V) 
o,” = average variance of row effects. 


Thus Model V is disposed of. 


The work of Irwin and Kendall [7] has shown the great advantages 
of defining the variance of a finite population of size N as 


N 
> (individual value—population mean)’. 
1 


We shall, accordingly, divide by N —1 and not N. The variance of 
the mean of a sample of n is 


and comparing the values for n=1 and n= 2, we see that the co- 
variance of any two elements in a sample is 


Thus (4.3), which was the least evident hypothesis for Model IV, 
holds. Reference to Section C of the Appendix shows that 


oy” = variance of cell effect population, 
(Model IV) < o,? = variance of column effect population, 
o,* = variance of row effect population. 


DYADIC ANALYSIS OF VARIANCE %5 


The formulae for Model III are, of course, the same. 

Thus the usual formulas for estimating effect variances hold for 
vectors satisfying Model I, II, III, IV, V or X. 

It seems to the writer that Model X provides, for the first time 
explicitly, that generality which is needed for practical use of the 
analysis of variance. 


The first example 


The data used here were first published by Immer, Hayes, and 
Powers [6], and have been used by Fisher [4] in his The Design of 
Experiments and by Yates and Cochran [13] in their paper on “The 
Analysis of Groups of Experiments.” 

The part of the experiment selected by Fisher as an example con- 
sists of barley yields for 2 years, 5 varieties and 6 locations. The 
yields for the two years will be treated as one set of coordinates of a 
two-dimensional vector variable. 

Rounded off to the nearest integer, the data are listed in Table 1. 


TABLE 1 


Yields of barley, 1931 and 1932. 
Data of Immer et al. 


VARIETIES 

M 8 Vv = P Total 
UF 81 105 120 110 98 514 
81 82 80 87 84 414 
W 147 142 151 192 146 778 
100 116 112 148 108 584 
M 82 77 78 131 90 458 
103 105 117 140 130 595 
Cc 120 121 124 141 125 631 
99 62 96 126 76 459 
GR 99 89 69 89 104 450 
66 50 97 62 80 355 
D 87 77 79 102 96 441 
68 67 67 92 94 388 
Total 616 611 621 765 659 3272 


517 482 569 655 572 2795 


76 JOHN W. TUKEY 


Recalling that the square of (147,100) is 


21609 14700 
14700 10000 
direct calculation shows that the sum of squares for all cells is 
944 315 
315 381 277 625/> 
that the sum of squares of the sums by variety is 
157 924 1 844 
1 844 346 1 579 583 /> 
that the sum of squares of the sums by location is 
1 874 386 1 560 rd 
1 560 145 1 353 727 /> 
and that the square of the total is 
10 705 984 9 145 240 
9 145 240 7 812 025/- 


Dividing these sums of squares by the usual divisors, and making the 
usual subtractions, the analysis of variance exhibited in Table 2 is 
obtained. 
TABLE 2 
Dyadic analysis of variance 
(First coordinate = 1931 yield, second coordinate = 1932 yield) 


DF ss MS _ OMS 

Mean 1 30(109.1 —M, 93.2 —M)? 
18011 7188) (3602 1438) 588 280) 
leutions (‘ries 10304] [nase 280 375, 
2788 2550) (697 637) 89 100) 
2550 2922] 637 731} [100 89) 
3279 40) (164 40) 
98 802 sor) 190] 199) 


It should be noted that this table includes, explicitly, analyses of 
variance for the 1931 and 1932 yields and the raw material for an 
analysis of covariance. 


DYADIC ANALYSIS OF VARIANCE 77 


8. New coordinates 


If any two linear combinations of the 1931 and 1932 yields possess 
unique and interesting properties, the average yield and the half- 
difference of yield would be expected to be the two. The transforma- 
tion of mean squares (or sums of squares or components of mean 
square) is simple. If the mean square of (z,y) is 


(s 5) 


u=—= ar + by, 


and if 


v=cx+ dy, 
then the mean square of (u,v) is 


a?A + 2abB + b?D acA + (ad + bc)B+ bdD 
acA + (ad+bce)B+bdD c?A + 2cdB+ d*D 


{4 A ac 
e dJ/\B D/\b dj> 
where the multiplication is matrix multiplication. The result of this 
transformation is shown in Table 3. 


TABLE 3 
Mean squares and their components 
(First coordinate = average yield; second coordinate = $(1931 yield-1932 yield) 


DF MS CMS 
Mean 1 30(101 — M, 8)? 
Varieties 4 [ 


| Facility in changing coordinates at any stage in the analysis is 
inherent in the vectorial (or dyadic) nature of the computations. It is 
illustrated by the fact that 


| 


78 JOHN W. TUKEY 


0 \4t —4/\100 s9/\4 

We can permute coordinate changes with the arithmetic operations we 
use at will. Just as an estimate of an effect component in an ordinary 
analysis of variance is all too often negative (because of the natural 
vagaries of sampling error), so the estimated o* for variety is neither 
positive definite or semidefinite. Sampling errors will have their way. 
Lacking significance tests at the moment, we must phrase our inter- 
pretations of these effect variances, obtained on so few degrees of 
freedom, in a very qualitative way. But these results suggest: 


and that 


(i) that there is an effect of location on both the average yield 
and the difference of yields, and that these effects may be 
positively correlated, 


(ii) that the effect of variety is likely to be solely on the average 
yield, 
(iii) that there is an effect of error on both average yield and 


difference of yields, while any correlation between these is 
doubtful. 


Of these suggestions, we may test a few with the classical significance 
tests. The estimates 111 and 71 of the error variances in the two 
coordinates are under the usual assumptions, both y*’s on 20 df. Their 
ratio 


lies between the 15 per cent and 20 per cent points of F, and is to be 
judged nonsignificant, even if the w; were normally distributed. If 
there had been more degrees of freedom and consequent significance, 
the difference (111 71— 40) would probably have been ascribed to 
plot-to-plot variability. Under suitable assumptions, which we shall 
not discuss at this point, the correlation in the location part could be 
calculated as 

78 


V (126) (406) 
and tested by the usual tests, to be found far from significant. 


r = 0.34, (on 5 df), 


DYADIC ANALYSIS OF VARIANCE 79 


9. A backward look 


In the last section we have done something which we are accus- 
tomed to do all the time in the analysis of variance,—we have replaced 
two values by their mean and a multiple of their difference; in a sense 
we have isolated a single degree of freedom—but we have done it 
among the independent variables and not in the usual place. It is 
natural to ask if it has made any difference in the results, and one q 
way to approach the answer is to set out the conventional analysis | 
with an additional classification for this degree of freedom, which is 
done in Table 4. 


TABLE 4 


Conventional analysis of variance 


8S DF MS 
Locations 21386 5 4277 
Varieties 5416 4 1354 
Years 3785 1 3785 
Lxv 4430 20 222 
LXY 7010 5 1402 
VxyY 316 4 79 
LxvxyY 2826 20 141 


(= error ) 


If we stop to consider what variable corresponds to the single degree 
of freedom called years, we recall that it is 


+ (1931 yield — 1932 yield) 
v2 
and that its variance is just twice the variance of 


(1931 yield — 1932 yield) | 
2 


which is the second coordinate in Table 3. It is no shock, therefore, 
to note that every mean square in Table 4 is twice a component of a 
mean square in Table 3 within the accuracy of our rounded compu- 
tations. The one case in which this is not obvious is 3785 = 2(30) 87. 

We were inquiring what loss of information or other convenience 
there was in using the conventional Table 4 instead of the novel Table 3. 
Each mean square in Table 3 requires three numbers to specify it (if 


f 

| 
| 

| 
) 


80 JOHN W. TUKEY 


it were not for the symmetry of squares and sums of squares it would 
need four). Two of these are represented in the mean square column 
of Table 4, but the third is not. Table 3 must, therefore, be giving us 
information about the situation that Table 4 does not. 

To understand the situation, let us look at a simpler situation— 
say an imaginary uniformity trial on 1001 plots for two years, year A 
and year B. Let the dyadic analysis run as in Table 5. 


TABLE 5 
Dyadic anova 
(Coordinates—yields in years A and B) 
SS DF MS 
1900000 1900000 1900000 1900000 


The result of rotating into the coordinates of sum and difference of 
yields, each divided by V2, are given in Table 6. 


TABLE 6 
Dyadic anova 
(Coordinates—2~4 times sum and difference of yields) 
SS DF MS 
M 3850000 50000 1 3850000 50000 
50000 50000 50000 50000 
110000 104 110 
104 70000] 70] 


The ordinary analysis would, then, be as in Table 7. 


TABLE 7 
Ordinary anova 
8S DF MS 
Year 50000 1 50000 
Plots 110000 1000 110 


Error 70000 1000 70 


DYADIC ANALYSIS OF VARIANCE ~ 81 


The terms we have lost are the off-diagonal terms in the mean squares 
of Table 6 which involve: 


(i) the covariance of the year-to-year difference and the average 
effect, taken from plot-to-plot, 

(ii) the covariance of the error in the difference of yields and the 
error in sum of yields. 


The first effect is clearly of frequent occurrence in real data, and one~ 
that is well worth detecting when degrees of freedom enough are at 
hand. The second effect could arise, for example, if the errors on 
the two years were uncorrelated, but the error variance during year A \ 
were different from that on year B. One thing, then, that the dyadic 
anova makes evident, and the conventional analysis with one more 
classification overlooks, is a certain sort of heteroscedasticity, a change 
of error variance with the special variable. — 
This last result gives us the essential clew; in order to match the 
insight of the dyadic anova in this case, we would require three con- 
ventional analyses : 


(i) a conventional anova with one extra classification, ~ 
* (ii) a conventional anova for year A, thie 
* (iii) a conventional anova for year B. 


If we had had more than 2 “ years,” the number of conventional analyses 
required to obtain the same insight would increase, and one of them 
would involve more and more complex calculations to isolate single 
degrees of freedom. 

We shall return to these interrelations when we have discussed tests 
of significance. 

Summing up the development to date, we have found that: 


(a) the usual formulas for the estimation of component variances 
by anova can be equally well applied when the quantities 
analyzed are finite- (or even infinite-) dimensional vectors ; 


(b) the square of an n-component vector has n? components, which 
are usually written in a square array; 


(c) the resulting dyadic anova is more penetrating than a single 
conventional anova with one additional classification; its com- 
ponents can be calculated from a sufficient number of conven- 
tional analyses. 


2 


82 JOHN W. TUKEY 


III. TESTS OF SIGNIFICANCE 
10. Introduction 


To apply tests of significance in conventional anova, it is cus- 
tomarily assumed that, in addition to conditions (4.1) to (4.4), 


(10.1) the variance of all cell effects is the same, 
(10.2) the cell effects are jointly normally distributed. 
Now let us recall formulas (B. 3) of the Appendix which imply that 


(a) column effects do not affect the row sum of squares, 
(b) row effects do not affect the column sum of squares, 
(c) neither column nor row effects affect the balance sum of squares. 


Thus, in conventional anova, when the null hypothesis “the column 
effects are equal” holds, the column sum of squares and the balance 
sum of squares depend only on the cell effects. Exactly the same thing 
happens in dyadic anova. (We can, of course, interchange “row” 
and “ column.”’) 

In conventional anova, the next things are to show that these sums 
of squares are distributed like multiples of y*?, and that these two sums 
are independently distributed. 

Let us split these two conclusions into three parts: 


(10.3) the sums of squares are independent, 


(10.4) the sums of squares are sums of squares of uncorrelated 
normal! variables, 


(10.5) the variances of these uncorrelated normal variables are the 
same. 


These results are established in Section C of the Appendix. 


11. The analog of the F-ratio 


When we have a one-dimensional quantity distributed like a sum of 
squares in normal variables, we say it is distributed like a multiple of x’. 
We can carry this mode of speech over to more components, but it must 
be done carefully, for if the quantity whose variance is 


10 *) 
3 #1 


DYADIC ANALYSIS OF VARIANCE 83 


is a multiple of a quantity whose variance is 


then the multiplier is, essentially, 


4) 


as can easily be verified. (Let the coordinates of the second be y, and y2, 
and let 7; = + 3y2, 7, then Var(z,) Var(y,) + 3Cov(y:,y2) 
+ 9Var(y2) =14+0+4+9—10, Cov(a,, 2.) = Cov(y:, y2) + 3Var(y2) 
=0+3—3, Var(z.) = Var(y.) —1. It is easy to see from this 
that any other linear transformation which serves the same purpose 


is the product ot( and an orthogonal matrix which leaves( i) 


invariant. 
In one dimension we used 
row mean square 
= 


F 


error mean square 


which can also be written, 
F (error mean square) — (row mean square) = 0. 


Again in the one dimensional case, the applications are easiest for F, 
but the algebra is a little simpler for @, where 
6(error sum of squares + row sum of squares) 


— (error sum of squares) = 0. 
Clearly 


(r—1)(e—1) (1—8) 
(r—1)0 


where the numbers of degrees of freedom enter as the factors between 
sums of squares and mean squares. 
In more than one dimension, the equation for 6 becomes 


| 9(error sum of squares + row sum of squares) — error sum of squares | 


= (0), 


_ where the vertical bars indicate that we are to take the determinant. 
Thus for “ varieties” against “error” the example of Section 7 gives 


84 JOHN W. TUKEY 


g (087 3302 3279 802 |- ‘ 
3302 6899/°  \ 802 3977 


in the first system of coordinates. When written out, this is a quadratic 
equation and will have two roots. The roots are independent of the 
choice of the coordinate system, and there will in general be one for 
each dimension. 

The distribution of the roots 6; is known for all dimensions when 
the sums of squares are sums of squares of normally distributed vectors, 
independent, and each of the same variance. It can be found, e. g., on 
page 268 of Wilks’ book [12]. In our case, we know that (10.4) and 
(10. 5) are sufficient to make these conditions hold, so let us examine the 
distribution. We will consider only the case of two dimensions, where 
the probability density is 

of ot 
(Constant) -(1— 6,) 6,7 (1— 6.) ? 6,7 (0,—6,)d6,d62, 
12>0,26,20 
where n, and n, are the numbers of degrees of freedom. (Note that 
(i) Wilks’ nm are each larger by a unit; and (ii) he gives the value 


of the constant.) 
The roots G,, G, of 


| G,(error mean square) — (row mean square) | = 0, 


are given by 
(n— 1)(c—1)(1— 


Gs (r—1)6, 
(r—1) (e—1)(1—6,) 
(r—1)6, 


in complete analogy to the relation between @ and F in one dimension. 
As an example, let us compute G, and G, for varieties versus error 
in the example of Section 7. 
In the first coordinate system, 


‘\ 40 199 637 731 40G,—63% 199G,— 131 


164 164 69% 637 


t 


or 


= 0. 


DYADIC ANALYSIS OF VARIANCE 85 


This yields 
31036G,2 — 207627G; + 103738 — 0, 
G; = 3.34508 + 2.8014 = 6.1464 and 0.5436. 


In the second coordinate system, 
111 —9 676 


111 —9],,_ me —9 


This yields 
7800G,? — 52070G, + 25622 — 0, 
G, = 3.3378 + 2.8029 — 6.1407 and 0.5349. 


The difference in results is due to rounding the mean squares to integers 
when changing coordinates. 


12. Levels of significance 


We have just calculated, from actual data, a pair of values of G, 
and G,. Are they significant on 5 and 20 degrees of freedom? This 
section will be devoted to working out the details of the answer to this, 
to show the reader that, even if the 5 per cent and 1 per cent levels 
of G, and G, are not tabled, they can be obtained without too much 
labor. Some readers may wish to skip to Section 14. 


The distribution in question is 
A(1— 6;)0,3(1 — — 6,)d0,d62, 126,26,=20, 
and we shall be concerned with the individual distributions of 6, and 62, 


since no detailed study has been made to show us that this is unwise. 
If we integrate out 6., we obtain for the distribution of 6, 


A(1— | = — — 0,84 J. 
2 2 2 2 


79.21 ~ 19.23 +5, ag 


86 JOHN W. TUKEY 


whence 
19.20.21 7 19.21.23 21.22.23 
1 22 
‘0 
aaa 19.20.21 19.21.23 + 21.22.23 
Now putting —1, 
19.20.21 19.21.23 21.22.23 
24A 


 19.20.21.22.23 


it follows that A is determined, and 


P(0,<2) $06 $000 | 
= 


6 
{ 253 — + 1902? 
3 
A few trials yield 

{ } Product 

.12156 3.633 441 

38 .011529 7.533 .0869 

-0053679 8.95033 -048045 


7715 .0055821 8.87676 049551 


and a close approximation to the lower 5 per cent point of 6, is .77194. 
The corresponding upper 5 per cent point of G, is 


20 (.22806) 
5(.77194) 
Thus the value of G, which we obtained is not significant at the 5 per 
cent point. 

Returning to the original distribution and integrating out 6,, we 
obtain the distribution of 6. 


Tr 


DYADIC ANALYSIS OF VARIANCE 87 


2 1 
A(1—6,)0%d0, [ 008+ 2008 


4 ye 4 4 
21.23 19.21 + 19.21 


A(1— 62) 6. 08, | 


4 8 4 
A 21.23 — 49,93 + 79.03 + 


+ 19.217 19.23 93 + 3793 21. 62, 
whence 


P(6:<y)—A 


8 16 8 
19.21.23 — 9.21.23 + + 


} 
19.21.23" 
Now putting y—1, 


8 16 8 4 
1) 19.21.23 19.21.23 * 19.21.93 19.2021 
_ 
19.21.23 21.22.23 19.20.21.22.23 


Thus A is found to have the same value as before, as it should, thus 
providing a partial check on the arithmetic. We now have 


P(6:=y) 440y'? (1 — y)? + y*°(506 — 880y + 380y*) 


1p 220(1 — y)? 253 — 440 190 


A few trials yield 


=—220(1-y)? 
y* 3 Product { } Product Sum 


5 .0013810 18.333 .02539 0695 26.83 .00002 02541 
55 0034152 14.85 05071 22.82 .00015 05086 
5483 .0033166 14.9622 .04962 0560 22.96 .00014 04976 


and a close approximation to the lower 5 per cent point of 6. is .54867. 
The corresponding upper 5 per cent point of G, is 


20(.45133) 


5(.54867) 


88 JOHN W. TUKEY 


Our value of over 6 is thus well beyond the 5 per cent point. A few 
more trials yield 


220(1-y)? 


y y Product { } Product Sum 
45 00050857 22.183 .01123 small 
447 .00049627 22.426 .010681 small 


443 .00043732 22.7516 .0099497 .078471 31.79 .0000027 .0099504 


and a close approximation to the lower 1 per cent point of 6, is .44326. 
The corresponding upper 1 per cent of G, is 


20(.55674) 
5 (.44326) 


Our value of over 6 is thus highly significant. 

The computations for any pair of degrees of freedom one of which 
is odd, will give only the amount of trouble that this example gave. 
If both degrees of freedom are even, a trigonometric substitution and 
the use of some reduction formulas will be added. While a table of 
5 per cent and 1 per cent points would be a great help, direct calcula- 
tion is still quite feasible. 


5.02. 


13. Very incomplete tables 
The simplest cases to compute are those with n, = 3, and, in order 
to permit a comparison with the F distribution, a relatively complete 
set of percentage points have been calculated for n, —3, ny 7%. They 
are given in Table 8. 
TABLE 8 


Probability levels of F, G, and G, for 3 and 7 degrees of freedom 


PROBABILITY (%) F G, G, 

99.5 .0225 .1182 
99 .0361 .2461 0034 
95 .1122 .4860 
90 .1899 .6804 .0354 
50 2.0670 .2429 
10 3.0740 5.4652 9092 
5 4.3469 8.8804 1.2463 
1 8.4512 17.5415 2.1716 


0.5 10.8822 22.901 2.6406 


DYADIC ANALYSIS OF VARIANCE 89 


To this we can add a table of the 5 per cent and 1 per cent points 
of G, for the case n, — 3, which follows (see Table 9). 


TABLE 9 
5% and 1% levels of G, for n,=3 


Ny 5% POINT 1% POINT 
3 1.71 3.63 
4 1.48 2.99 
5 1.37 2.52 
6 1.30 2.31 
7 1.24 2.17 
8 1.21 2.08 
9 1.18 2.02 
10 1.16 1.95 
12 1.13 1.88 
16 1.10 1.78 
24 1.07 1.69 
48 1.03 1.63 


14. The value of dyadic significance testing 


We now know that it is possible to make dyadic significance tests, 
although at the expense of somewhat more labor (even if tables were 
available). For we must solve the characteristic equation. What do 
we gain in return? 

We saw in an earlier section that the dyadic analysis gave informa- 
tion about quantities ignored in the conventional analysis with one 
more classification. In our simple example of the uniformity trials this 
extra information concerned : 


(i) the covariance of the year-to-year difference and the average 
yield taken from plot to plot, 

(ii) the covariance of the error in difference of yields and the error 
in average yields (which could arise from uncorrelated plot 
errors if the variance were different in the two years). 


The dyadic significance tests take these possibilities into account. Thus 
if the error variance were really different in the two years, the analyst 
who wishes to make a test against error has three possibilities: 


(a) he may use e conventional analysis with an additional classi- 
fication (years); he pays for this decision by accepting an 


90 JOHN W. TUKEY 


unknown distortion of his significance levels and of the sensi- 
tivity of his test by the failure of the error variance to follow 
the hypothesis on which his significance test is based ; 

(b) he may analyze the two years separately; he pays for this 
decision in two ways, by halving the sensitivity of his analysis 
and by failing to have any “accurate ” basis for judging whether 
the effects studied, in case they are found significant in both 
years, are consistent from year to year; 

(c) he may make a dyadic analysis; he pays for this decision in a 
slight increase in computational labor and a slight loss in 
sensitivity. 


We are not prepared to tell the analyst how to choose in such a situa- 
tion—his choice must depend on all he knows about the problem, and 
may have to depend, as well, on such limitations of time and computing 
facilities as he may suffer under. It seems to the writer, however, that 
there will be many cases in practice where the analyst will choose the 
third approach and use dyadic anova and its significance tests. 

In discussing the costs of using the dyadic analysis, we mentioned 
a loss in sensitivity; such a loss is inevitable when we broaden the 
scope of our null hypothesis. It would be very helpful to the analyst, 
faced with the three-fold decision above, if he knew 


(1) how much sensitivity he loses by using the dyadic analysis, 


(2) how much the operating characteristic of the conventional analy- 
sis is disturbed by, for example, different error variances in 
different years, and how it is disturbed. 


At the present state of the theory, we can-offer no guidance on these 
points at all. The intuition of the analyst must work unassisted for 


the present. 


15. Further discussion of the example 
We have carried along one test of significance from our first example 
(Section 7), namely 


varieties against error, 5 against 20 df, 


where we found 
G, = 6.14, G, = 0.54. 


These correspond to values of @, = 0.3945, and 6, = 0.8811 respectively, 


ce 


DYADIC ANALYSIS OF VARIANCE 91 


and substitution in the cumulative distribution functions of Section 13 
gives probability levels of 0.0024 for 6, and 0.331 for 6,. Since the 
correspondence of G and 6 interchanges lower with upper, we find G, 
0.0024 of the way from the top of its distribution on the null hypothesis 
and G, 0.331 of the way from the top of its own. How, then, shall we 
interpret these results ? 

There is little doubt of the significance of G,—and we must inter- 
pret this as meaning that there is an effect of variety. Now for G,, 
if we had so many replications that we had implicit confidence in the 
variety averages, we might plot these averages, using, indifferently, 
either the individual year’s results, or the sum and difference of these 
results as coordinates. Then two possibilities would clearly arise: (i) 
the population variety means fall on a line, (ii) the population variety 
means do not fall on a line. In case (i) it would be expected that the 
distribution of G,. would be little different from its value on the null 
hypothesis. How little different? The answer to this problem lies in 
the distribution problem mentioned in the next section, and so far is 
unanswered. On a tentative basis, then, we interpret the value of G, 
as giving no evidence that the variety averages do not fall along a 
straight line. 


Let us recall the estimated o? for variety; it was 


0 
0 
which is surely consistent with the interpretation above. It indicates 
that the line of variety averages is nearly parallel to one axis for 
average effect; it is a line along which both years’ yields change nearly 
together and the year-to-year difference changes little if at all. 

We might like to test whether this line of averages is parallel to the 
axis, but no answer is given here. We have brought up three questions 
about the effect of varieties: 


(a) Is there an effect of variety? 

(b) Do the variety effects lie on a line? 

(c) If the variety effects lie on a line, does this line have a given 
direction ? 


It is important to remember that the sensitivity of our analysis 
decreases rapidly as we go down the list. A few replications may 


| 
4 

ay 

i 

. 
al 
4 

J 

4] 
a 
J 
1 

ial 

al 

4 
| 

} 


92 JOHN W. TUKEY 


answer (a) fairly well; many more will be needed to get a grip on (b), 
and still more will be needed to grasp (c) effectively. 


IV. FIDUCIAL STATEMENTS 


16. A short horse 


The possibility of making fiducial or confidence statements in con- 
ventional anova usually rests on the additional hypotheses: 


(16.6) the distribution of row, column and cell effects is jointly normal, 


(16.7) the mean value of all row effects is the same; the mean value 
of all column effects is the same (though not necessarily the 
same as the row effects), 


(16.8) the variance of all row effects is the same; the variance of all 
column effects is the same (though not necessarily the same as 
the row effects). 


Under these hypotheses there is no doubt of our abiilty to make, in 
principle, confidence statements in the general case of dyadic anova, 
but we are far from doing it practically. In the first place, the relation 
of two dyadic variances need not be that of simple scalar proportionality. 
The distribution of the roots of 


K (error mean square) — (effect mean square) | 0 
8q 8q 


is none other than the distribution of a constant times G, and the same 
constant times G, whenever the two component variances are in simple 
scalar proportion. Otherwise we get, for each root and each pair of 
degrees of freedom, a one-parameter family of distributions. While these 
distributions may be known, the writer can give no reference to their 
explicit formulation. 

Much mathematics and tabulation will have to be done before we 
can make fiducial statements about dyadic component variances on a 
simple, practical basis. 


V. CHOICE OF TERMS 
1%. Introduction 
When statistical methods for handling many variables together are 
developed, they frequently find one of their most important uses in 
treating variables linked by an exact, non-linear functional relation. 
The classical example is multiple regression, where the calculation of 


— 


— 


DYADIC ANALYSIS OF VARIANCE 93 


curved regression lines by multiple regression on, say, various powers 
of the independent variable is one of its more frequent and important 
uses. We are about to investigate an application of dyatiic anova where 
the same phenomenon will occur. 

A problem which has been badly neglected in many fields where 
conventional anova has been used—I need name only agriculture, 
chemical engineering and sensory physiology—is the choice of terms. 
The dictum of Fisher on this point, “The experimenter, for example, 
has a perfect right to measure the efficiency of different feeding stuffs, 
either by the average percentage increase of different animals, or by the 
average absolute increase, as he pleases, and, with a properly designed 
experiment, he will ascertain whether the materials tested do or do not 
give signficantly different results as measured in these alternative ways ” 
(The Design of Experiments [4, Section 56]) seems to the writer to be 
misleading. There is need here for a careful distinction between experi- 
ments with a purely technological purpose, and experiments with e@ 
scientific purpose. In a purely technological experiment, it can often 
be argued that the terms on which the average effect is wanted has been 
fixed by the problem. To carry Fisher’s example into more detail, it 
may be that the real question is this: Given that farmers are going to 
feed a certain breed of animal and, after feeding, they will receive a 
certain sum per pound of live weight—knowing that the costs of feeding 
the alternative rations are equal (or neglecting the difference), to deter- 
mine by experiment which ration should be used. Now it can be argued 
that the experiment should be carried out by 


(i) choosing experimental animals with an initial weight distribu- 
tion resembling that which the farmer will have, 


(ii) analyzing the average absolute increase, 


since in this way we measure most directly the pounds of increase, and 
hence the dollars of income which the farmer will receive. 

In a technological situation this position is surely arguable, yet 
there is a strong counterargument. It has been our broad experience 
that it is safer and more efficient to found technology on science rather 
than on empiricism ; the study and application of general principles has 
been worth its salt. One minor way in which it pays can be cited in 
our example. It is not unlikely that the initial weight distribution 
used by the farmer will vary from time to time or from region to region. 
As soon as this variation is large enough, it will not pay us to have used 


7 

i 

| 

4 

it 

| 

| 

| 

| 

{ 

4 

| 


94 JOHN W. TUKEY 


the analysis which tells us most about a single situation. When it 
comes to judging whether or not there will be variation enough, the 
argument is likely to be hot and heavy. 

If the experiment is a scientific one, intended primarily to deepen 
our knowledge of what is going on, then there is no basis for argument; 
we should use the analysis which gets the most basic information out 
of the data. The scientific experiment may have a very technological 
basis—it may be to determine at what temperature a melt of optical 
glass should be held in order to get satisfactory quality as quickly as 
possible—yet if a more accurate insight into the effect of the auxiliary 
variables comes out from an analysis of “log hours” or “ hours?” rather 
than “hours,” there is little room for doubt as to the best procedure. 

In the field of experimental design, the corresponding principles 
have been emphasized by Fisher and are recognized and practiced by 
most practicing statisticians. To quote again from The Design of 
Experiments [4, Section 39], “. . . any conclusion, such as that it is 
advantageous to increase the quantity of a given ingredient, has a wider 
inductive basis when inferred from an experiment in which the quan- 
tities of other ingredients have been varied, than it would have from 
any amount of experimentation, in which these had been kept strictly 
constant.” The great gain in this direction which can be made by 
the choice of experimental conditions is familiar, yet the lesser gains, 
lesser but far from negligible, which can be made from a choice of terms 
for analysis have been all but neglected. 

The importance of a choice of terms in the past development of the 
physical sciences is a fact which is rarely taught to the student, for the 
teacher is so used and firmly attached to the terms which are helpful 
that he overlooks the importance of their discovery and use. Two 
examples should be mentioned: Count Rumford’s discovery that the 
quantity of heat should not be measured by the temperature alone, and 
Rydberg’s discovery that a rational relation between spectral lines could 
be found if the wave length were abandoned and its reciprocal, the 
wave number were used instead. These were the necessary preliminaries 
to the development of thermodynamics and quantum mechanics! 

While it is unlikely that a change in terms in the interpretation of 
agricultural field experiments or in clinical psychology will have com- 
parable repercussions, yet the example of the well established sciences 
suggests that inquiry is likely to be profitable. 

Leaving the question of improvement in basic understanding, let 


DYADIC ANALYSIS OF VARIANCE 95 | 


us pass to the more directly observable question of efficiency. When it . 
comes to a choice between lattice squares and balanced incomplete blocks, , 
a matter of 5 per cent or 10 per cent in efficiency is highly esteemed. k 
Yet I am certain, as a result of experience with actual data, that Hl 
there are many places where a change in terms will gain 50 per cent ) 
or 100 per cent in efficiency. How can this be? | 


It is a fact, regrettable perhaps but true, that in most well organized 
and designed experiments the accuracy of the important conclusions is 
limited, not by the errors of exact replication but by the interactions. 
The proper attention to broadening the basis of inference has brought 
with it the really fundamental source of errors of inference—the failure 
of different cases to respond alike. 

If we can find a way to fairly and consistently reduce the size of 
the interactions compared to the effects, we have improved the efficiency 
and accuracy of the experiment in a direct, measurable and important . 
way. The choice of terms offers us a way to do this, if the original q 
terms were not the best. Consider a simple numerical example: Under 
conditions where experimental errors can be regarded as negligible, the 
wave lengths in Angstroms of the lines of the hydrogen spectrum con- 
necting five given upper states (7, 8, 9, 10, 11) with two given lower 
states (3, 2) were found to be: 


10049.8 9546.2 9229.7 9015.3 8863.4 
3970.07 3889.01 3835.40 3797.91 3770.06, 


while the corresponding wave numbers (reciprocals of wave lengths) 
are in em™ 


9950.4 10475.3 10834.6 11092.3 11282.4 
25188.5 25713.2 26072.9 26330.3 26524.8. 


Analyses of variance for the two tables yield, 


MEAN SQUARES 


df Wave length Wave number 
Upper states...... 4 610 512 2 246 531.00 
Lower states...... 1 75 305 788 580 578 469.85 
Interaction ....... 4 76 609 1.87 


The customary method of assessing anova data would state that the 


| 
| | 
| 
} 
i 
3 
: 
| 
i 
H 


96 JOHN W. TUKEY 


difference due to a change of upper state from 8 to 10 was, on a wave 
length scale, 
—316+417 A 


or, on a wave number scale, 
+ 617.2 + 2.1 cm". 


The increase in precision is clear. With this lengthy introduction let 
us approach a possible solution of the problem of choice of terms in 
anova. 


18. Reducing interaction in a row-by-column analysis 


We begin with the simplest case, a row-by-column analysis without 
replication in the cells, where the effect of rows or of columns must 
be tested against interaction (alternatively called discrepance or error). 
We seek to choose a scale best suited for such tests. It is clear that a 
single anova table, or a dozen such tables cannot be expected to deter- 
mine a firm and exact scale, any more than one sample, or a dozen 
samples, determine exactly the mean of the population. As a practical 
matter we must limit our choice of scales. The simplest limitation, 
and one which should be adequate during the initial development of 
the subject, is to restrict ourselves to the linear combinations 


y —af(z) + bg(z) 


of two well-chosen functions f(r) and g(z). 

What should we mean by well-chosen? Often because of vague 
theories or experiences in related situations, we are able to propose one 
or two functions which seem reasonable. If so, we would use these. 
Otherwise, we can only do as good a job of guessing as possible. 

Now it is desirable, after finding “ good” terms, to state, in these 
terms, the average effects of changing, say, from the 3rd row to the 5th 
row, and to attach to this average effect an honestly computed standard 
error (or to make some more refined fiducial statement). In the present 
state of our tabulated knowledge of the distributions involved, this 
does not seem practical when the terms are chosen with the aid of the 
anova table analyzed. We shall, for the present, have to assume that 
the choice of terms is based on one set of anova tables and is applied 
in others. This makes the extraction of good examples from the litera- 
ture difficult, and may seem to hamper the practical use of the technique 


DYADIC ANALYSIS OF VARIANCE 97 


unduly. This does not seem to me to be so, for it is rare that a single 
experiment is so extensive as to usefully determine the terms satis- 
factorily, without being large enough to be split into independent 
portions which will give more or less independent choices of scale, the 
mutual consistency of which can be examined. 

We have now discussed what end we wish to approach; we gave away 
long ago the fact that we propose to do it within the framework of 
dyadic anova ; there remains the choice of criterion. On purely heuristic 
grounds we shall take the ratio 


SSRy + SSCy 
SSBy 


of the row sum of squares plus the column sum of squares to the balance 
sum of squares. It is our task to choose the ratio of a to b in 


y =af(x) + bg(z) 
so as to maximize this ratio. If we introduce the vector variable 
(f(z), g(x) 


then y is just an unspecified component of z. The directions in which 
the ratio of sums of squares are greatest and least, as well as these 
extreme ratios, are determined by the determinantal equation 


| ASSBz — SSRz — SSCz | = 0. 


The extreme ratios are the determinantal roots, while the directions 
are given by the corresponding “latent vectors.” We shall now illus- 
trate the situation on an example. 


19. An example 


It is of course natural to analyze an example which does not involve 
too many numbers and which shows a large effect due to change of 
terms. A convenient example is the comparison of potencies of two 
preparations of tobacco mosaic virus by Youden and Beale [14]. This 
example is discussed, with slightly modified numbers, by Snedecor in 
Sections 2.13 and 11.5 of his book [9]. Since the original data were 
not easily available when this section was written, the computations 
involve the modified data. We have the number of lesions, z, for each 
combination of two preparations and eight plants. Having no reason 
to do otherwise, we choose f(z) 2 and g(x) —2*, with the result 


3 


| 
q 

| | 

| 
| | 
| 

i 
| : 

| 
i 
| 

i 


98 JOHN W. TUKEY 
that the data and the dyadic anova take the forms set out in Tables 10 
and 11. 
TABLE 10 
Youden and Beale’s data (as modified by Snedecor) 
PREPARATION 1 PREPARATION 2 
PLANT f(@) g(@) f(a) g(a) 
1 9 81 10 100 
2 17 289 ll 121 
3 31 961 18 324 
4 18 324 14 196 
5 7 49 6 36 
6 8 64 7 49 
7 20 400 17 289 
8 10 100 5 25 
TABLE 11 
Dyadic anova of the data in Table 10 
DF Ss MS 
64 2256 ( 64 2256) 
Treatments 2256 70300] 2256 79399 
575 17986 f 82 2569) 
Plants s00soo] 2569 84358 
7 65 3078 93 440) 
3078 155140 440 22163 


The determinantal equation wé are to consider, therefore, is 


)-( 


65 3078 
3078 155140 


whence 


639 20242 
20242 669908 


=0 


610016A* — 18068728A + 18332648 = 0, 


and 


A = 28.5681, 1.0520, (on 8 and 7 df). 


Now if we substitute 28.6 for A in 


mately 
1220 67789 
67789 3767096 


( 


the determinantal equation,—this 
is a close enough approximation to the larger root—we obtain, approxi- 


)| = 508599 = 0, 


i- 


DYADIC ANALYSIS OF VARIANCE 99 


and we notice that, 


1220 67789 1 55.56 
= 2 


Since this is the result of substituting A — 28.6, the vector (1, 55.56) 
corresponds to the other determinantal root. We want to choose a 


combination of f(z) and g(x) which will not involve this vector at all, 
and so we use some multiple of 


y= f(z) 55.56 
where we have changed the sign and divided by 55.56, so that increasing 
f(x) by h and g(x) by 55.56h, that is, changing z by h times (1, 55.56), 
will have no effect on y. 
To a satisfactory degree of approximation 
y = .0182? 


and has the following analysis of variance, 


DF SS MS 


Treatments ..... 1 8.5 8.5 
Balance ........ 7 4.5 0.64 


as may be directly computed from the dyadic anova table by the relation 
Sy? Sf? —.36Sfg + 


We must bear carefully in mind that it is unfair to test the effective- 
ness of a change in scale on the data from which it was chosen, yet 
we shall do just that. We have the following results: 


Fora Fory Ratio 
Tet 99 1.88 1.90 
Balance SS 
8.85 26.4 2.98 
Balance SS 


If the terms of y had been chosen on independent evidence, we would 
have found apparent increases in efficiency of 90 per cent in detecting 


4 


© 


| 
| 
1 
| 
iW 
| 
1 
‘ 
4 
is 


100 JOHN W. TUKEY 


differences between preparations and 200 per cent in detecting differ- 
ences in the response of different plants. If the real increases in effi- 
ciency are a quarter or a fifth of this amount, they are large and 
important. 

The value of such changes in terms, and of this method for selecting 
the new terms, must rest on continued experience with actual data; its 
value will, of course, vary from field to field. 


20. Further study of the example 


Since the last section was written, the writer has examined the 
original paper of Youden and Beale [14]. Unfortunately for a really 
adequate check, they give full data on only one experiment, which 
involves eight plants with six leaves. The data quoted by Snedecor 
are for Leaf 2 on each plant. There are five more leaves, and while 
they cannot strictly serve as an independent check of the usefulness of 
our suggested change in terms, they will serve as a reasonable facsimile 
of such a check. 

The computations have been carried out rather crudely, by replacing 
the number of lesions, z, by z — .018z°, expressed to the nearest integer! 
The resulting correspondence is: 


2 0123456789 10 11 13 15 16 17 18 19 20 21 23 26 40 
aw-018@7 0123455678 8 9 1011 11 12 12 12 13 13 13 1411 


Two things are worth noticing: (1) carrying the results to only the 
r>erest integer has destroyed some information; (2) the use of the 
transformation for z = 40 is probably misleading, since increasing values 
of x correspond to decreasing values of the transform. If the question 
of a good transformation for this data were brought up, the leading 
contenders would seem to be: 


(a) V2, 
(b) log(1+ 
(c) log(a+ <2), where a may be about 4,—a pure hunch. 
In any event, let us return to the actual computations. 
Table 12 lists the total number of lesions for leaf i on eight plants 


and the sums of squares for both analyses. The “ efficiency” column 
contains the ratio of the two values of 


SSRr + SSCr 
SSBr 


DYADIC ANALYSIS OF VARIANCE 


TABLE 12 


101 


Effect of transformation on sums of squares of virus lesions 


ORIGINAL DATA TRANSFORMED DATA EFFI- 
LEAF LESIONS’ Treat’s Plants Balance Treat’s Plants Balance CIENCY 


1 240 49 805 352 0 14127 216% 

2 205 Basis of transformation 

6 194 9 625 304 4 154-60 126% 

3 98 9 17278 1 5263 34% 

5 90 25 56 «136 14 32-35 220% 

4 68 1 31 37 0 24 «32 88% 
land 6 58 1430 656 4 295 87 152% 
3, 4,5 35 259 246 15 108 130 87% 


This suggests that for leaves with average lesion numbers of about 
25 == 200/8, the transformation increases the efficiency by about 50 per 
cent, while for leaves with about half as many lesions, it may reduce 
the efficiency slightly. This is not unexpected in view of the fact that 
the value — .018 was chosen from an analysis on a single leaf position 
in a single experiment, where the average lesion number was about 25. 
The transformation seems to have done all that can be expected of it. 


21. Relation to discriminant functions 


Any reader familiar with discriminant function technique has cer- 
tainly wondered, by this time, about the relation between the two 
techniques. They are the two simplest special cases of a general 
procedure and their fields of application are unlikely to overlap,—the 
general case, of course, has a field of application covering the fields 


of both special cases and much more besides. 


The traditional discriminant function example involves a one-way 
classification of populations and several measurements carried out on 


the individuals of samples, one from each population. 


The problem 


is to find a function discriminating as well as possible among the 
populations. By the values of this function we refer individuals to 
populations, and we wish to do this as accurately, given the initial several 
measurements, as we can. To meet the practical difficulties of the 
situation, we limit ourselves to linear combinations of the measure- 


|_| 
— 

r 
yf 
g 
J 
Dn 

| 
| 

a 


102 JOHN W. TUKEY 


ments, and we judge ability to discriminate by the ratio of the “ be- 
tween” sum of squares to the “within” sum of squares. 

In our case, we have a multiple classification and a single measure- 
ment, and we seek to improve the sensitivity of the analysis by selecting 
the terms of comparison. We make the same choices of only linear 
combinations, judged by a ratio of sums of squares for the same reasons. 

The general case, then, would be a multiple classification and several 
measurements on examples from each cell. Here we would ask for that 
linear combination of measurements which discriminated “ best.” The 
problem of defining “best” here is not easy, and we shall not go into 
it here. One could clearly use the technique set forth above, but how 
is one to interpret the resulting linear combination? The same problem 
exists in the ordinary use of the discriminating function with one-way 
classification into more than two categories. Is the “sum of squares 
between ” a satisfactory measure of the separating power? Not in all 
cases, for if discriminant functions of unit variance give mean values 
to three categories of —4, —3, + 7, and —3, 0, + 3, respectively, 
one will ordinarily prefer the second to the first, although the sum of 
squares between, and hence the ratio of sums of squares, is smaller in 
the ratio of 18 to 74! Much careful thought must be given to such 
questions in the light of general scientific inference before it is worth- 
while to go far with mathematical methods. 

The problem of the choice of terms does not escape this dilemma 
in problems of one-way classification. I am not prepared to suggest 
the sort of analysis described above for any of these problems I have met. 
But when two-way and higher classifications enter, and our broad 
experience points a finger at the size of the interaction, in this special 
case the dilemma is resolved and we can reasonably set forth to use the 
analysis outlined in the immediately preceding paragraphs. 


22. Tests of significance 


No mention has been made in this chapter of tests of significance, 
for the very good reason that, so far as the writer knows, they have not 
yet been developed. They probably must wait until the distribution 
of determinantal roots and latent vectors is adequately available in 
the general case. 


SUS 


» 


DYADIC ANALYSIS OF VARIANCE 103 


APPENDIX 


A. Two identities and a lemma 


We shall write a dot in place of a subscript averaged over, thus 
given 2;, where 1 goes from 1 to n, 


z. 
(or, as we shali put later reminders, nz. = S2;.). Now the identity 
(A. 1) = + nz.?, 
is well-known for numerical values; we establish it for vectors by 
squaring and summing 
+ (z.), 
and noticing that sums of cross-products like 
= (x.) —z.) = 


vanish because one factor can be taken through the summation as a 
constant, while the sum of the other is zero by definition. 

If the x, are thought of in a rectangular array with « running 
through c columns, and j running through r rows, then it is natural 
to square and sum 


= 2.5; —2..) + (4. + (@.5—2..) 4+ 72.., 
(where raj. = xj, and cr.; = rj, and rex. . = Si; to obtain the 
identity 

+ + 


The sums of cross-products vanish for the same reasons as before. In 
terms of the suggestive notation used earlier, this identity runs 


(A. 2’) SSz = SSBr + SSCzx + SSRr + SSMz. 
We shall make repeated use of this 


Lemma. If Yn have means pn, Variances , on”, 


and if in addition all the covariances are equal to i, then 


al 
| 
| 
t 
| 
l 
l 


104 JOHN W. TUKEY 


(A. 3) var(y.) + *—*), 


(A.4) = (n—1) + 


where 


1 
and 


= Sia, 


(Notice that we average o* and not oc.) 
We begin by noting that 
E(y?) = (Ey)? + var(ys) =n? + 
and, for 


E(yy:) (Ly) +- cov (yi, y1) = pips 
Then 


E(y.) => S(Ey) 
and 


n—1 
n 


A 


whence 
n—1 
n 


var(y.) = E(y.*) — a, 
which is (A. 3). 
Now write (A.1) in terms of y;’s rather than z;’s and take exper- 
tations on both sides; 
B( ys?) = + nE(y.)? 
whence 
+ 07?) = y-)*) + + (n—1)d 


and 


y-)?) = (n—1) (o.*—A) + 
which is (A. 4). 


DYADIC ANALYSIS OF VARIANCE 105 


Let us look back at these identities and lemmas, recalling that our 
z’s and y’s may be vectors. When we do, we see that 2,7, — 2,2, has 
never been used. The only strange thing is that o;? is not in general 
the square of a vector, so there may be no standard deviation. Since 
all our results are expressed in terms of variances and covariances, 
this will bother us not at all. 


B. The model and its properties 
We now suppose that 


(B. 1) Ly Uy + Vio + Woj + 
that is, that 


(observed value) = (cell effect) + (column effect) + (row effect) 
+ (general effect). 

Here i goes from 1 to c, j goes from 1 to r, and 0 is a label showing 
that there was never any subscript there. Notice the distinction between 
Vio and u.. One is a column effect, while the other is a column average 
of the cell effects. 

We begin by assuming only that the covariance between any cell 
effect and any column effect is the same, say Apo, that is, 
(B. 2) COV Vio) = Are- 
Since the covariance of an average [or of a difference] is the average 
[or the difference] of the covariances (so long as there is no overlap), 


cov(%.—U.., Vig —V.o) = 0, 
and, symmetrically 
COV(Vi9 — V.9, UW. —U..) = 0, 


where ru. = Sj uy, and reu.. = uy, and cv.o = Sivio, 
Since 
Ly — + —Ujyj— —U.j+U.., 
— = — U..) + (Vio — 0-0), 


the other terms vanishing identically, we have 
(B. 3) SSBr = SSBu, 
SSCx = SSCu — SSCv — — u..) (Vio — 
V.0) (us. —U..). 


i 
| 


106 JOHN W. TUKEY 
In view of the vanishing covariances that followed from (B.2), the 
expectations of these two sums vanish, so that 

E(SSBr) = E(SSBu), 

E(SSCz) = E(SSCu) + E(SSCv). 


We now evaluate the expectations involving the u’s, making the 
assumption 


(B. 4) the covariance of any two w’s is X. 


From this it follows at once that the covariance of the averages of any 
two nonoverlapping sets of u’s is also A. Hence, for iJ, 


cov(u., Us.) =A, 
Cov Ur.) =A, 


cov(u., Uy) =A, 
whence - 
Cov (Uij— Uj — O. 


Further, cov(u;,u.), and then var(uj;— %.), can be found as follows: 


1 
Cov Ui.) = * var (uy) +5 cov( uy, Wis) 


ay + a; 
similarly, 
COV (Uy. , Uy) — + A, 
and from (A. 3) 
var ) 2, 


where = so that 
var — uy.) = — cov( Uj.) —cov(U., + var(U. ) 
: A +20.” 


T 
r—1l1 


r—l 
Tr 


A 


= oi; — 2 


= (1— Joy? + A. 


Now we can apply (A. 4) to the w;—-u;. (for fixed 7), and to the 
u,., obtaining 


DYADIC ANALYSIS OF VARIANCE 107 
— a.) — 
+40..2— 7 
+ (my — — 


A} 


r—l1 


Summing the first formula over j yields E(SSBu) on the left, 
while multiplying the second by r yields H(SSCu). Hence 


E(SSBu) = (e—1)(r—1) (e..2—A) + SSBp, 
E(SSCu) = (e + SS8Cp. 
The terms in the y»’s vanish when »i; is a function of j alone. 


If the covariance of the vjo’s are all equal to A,, and their means 
are all the same, then (A. 4) implies 


E(SSCv) = rE (Si( vio — v.0)?) = —1) — Ac). 
Summing this up, we have the 
Lemma. Suppose 
= + Vio + Woy + 


suppose that the covariances between any two u’s is A, between any two 
v's is Ac, and belween any u and any V ts Ave, and suppose that the mean 
of the wy depends only on j, then 


E(SSBu) = —1)(r—1)o?, 


E(SSCu) = (ec — 1) (o»? + re,”), 
where 
oy”? = —A-+ (average of variances of 1;), 


= + (average of variances of vio). 


Now we set out to show that (4.1) to (4.4) imply certain 
expectation formulas. The hypotheses of the last lemma are satisfied 
both as given, and with rows and columns interchanged. Hence the 
expectation formulas hold with o,?, o,*, o,? defined as in the lemma. 
For o,” and a,” these definitions coincide with those given in Section 4. 
We have only to show that 


| 
| 
i 
| 
| 
| 
| 


108 JOHN W. TUKEY 
E( — .)?} — (r —1)(e—1)(o. .2— A) (r—1) 1) 


in order to complete the proof. But this is a consequence of (A. 4). 


C. Distribution of the sums of squares 


When the vj are equal, the column sum of squares is a function 
of the 


Uz. — Uses I =—1,- 
while the balance sum of squares is a function of the 


and because of (10.2) and (4.4) these quantities are jointly normally 
distributed with mean zero. The two sets of quantities will be inde- 
pendent, thus implying (10.3), if the covariances between one of one 
set and one of the other vanish. 

If iJ, we can write down the covariances (some of which are 
variances) of u;, and w.. with uj, w., wy; and u.. by arguments 
similar to those following (B.3). The results are 


uy. a 
u 


Whence the covariance we want is given by 


{or .? —o17 — oi + + 0.7 —o..*} 


which vanishes when o;;* depends on i alone. 


| 


DYADIC ANALYSIS OF VARIANCE 109 


When = /, the situation is simpler, since the variances of 
are all the same (if oi? depends on i alone) and the covariances vanish, 
so that we have the independence of a deviation from a sample mean 
and the sample mean. 

Thus (10.1) is more than sufficient to establish the independence 
of the column and balance sums of squares. 

Now we come to (10.4) and (10.5). We shall assume (10.1) 
without further ado, which implies that A, independent of 

It is well-known that an orthogonal matrix {c*}, (the a is an 
index, not a power!) 0 a=r—i1, 1Si<r, can be found so that 
c° =1/Vr. That is, particular numbers c; can be found so that 


Le 1, 


and 


Vr u.., 
where the uj; may be vectors. Define 


Ue = Di 
then 
Da ta? = Da (Si Se (Da us. 
= Si(u.)? = Si(us. —u..)? + ru. == —u..)? + uy? 
an 


= Si (us. —u..)?. 


This demonstration, valid for uj of any dimension establishes (10. 4) 
for the row sum of squares, since 


r-1 
(Ve ta)? 
while the expectation of the uw; is constant, so that the expectation of 
U, is zero. Further, 
= Di (uy. ) 


| | 
| 
| 
| 
| 
| 
| f 


110 JOHN W. TUKEY 


which vanishes for «48 since the H(u.)* are all equal and {c,*} is 
orthogonal. When «=, we have 


E (ue)? = 


where 4 is the common value of F(u.)*. Thus both (10.4) and 


(10.5) are established for the row sum of squares. 

An exactly similar argument, involving an orthogonal transformation 
in the rc dimensional space of the uj; proves the same for the error 
sum of squares; we omit the details. 


REFERENCES 


{1] Bartietr, Maurice G. Multivariate Analysis. Suppl. to the Journal of 
the Roy. Stat. Soc., 9 (1947), 176-197. 

[2] EtsenHART, CHURCHILL. The assumptions underlying the analysis of 
variance. Biometrics, 3 (1947), 1-21. 

[3] Fisner, R. A. Statistical Methods for Research Workers. Oliver and Boyd, 
10th edition, 1946. 

[4] FisHer, R. A. The Design of Experiments. Oliver and Boyd, 4th edition, 
1947. 

[5] Gress, J. WitLaRp. Vector Analysis. (Edited by E. B. Wilson), Scribners, 
1901. 

[6] Immer, F. R., Hayes, H. D., and Powers, LeRoy. Statistical determination 
of barley varietal adaptation. J. Amer. Soc. Agron., 26 (1934), 403-7. 

[7] Irwin, J. O. and KenpaLt, M. G. Sampling moments of moments for a 
finite population. Annals of Eugenics, 12 (1943-5), 135-142. 

[8] Pirman, E. J. G. Significance tests which may be applied to samples from 
any populations. III. The analyses of variance test. Biometrika, 29 
(1937-8), 322-335. 

[9] Snepecor, Georce W. Statistical Methods. Ames, Iowa, Collegiate Press, 
4th edition, 1946. 

([10] Wercn, B. L. On the z-test in randomized blocks and Latin squares. Bio- 
metrika, 29 (1937-8), 21-52. 

{11] Wurrney, Hasster. Tensor products of Abelian groups. Duke Math. J., 4 
(1938), 495-528. 

{12] Wixs, S. 8S. Mathematical Statistics. Princeton University Press (plano- 
graphed), 2nd edition, 1946. 

{13] Yares, F., and Cocuran, W. O. The analysis of groups of experiments. 
J. Agric. Sci., 28 (1938), 556. 

{14] Youpen, W. J., and Beate, HELEN Purpy. A statistical study of the local 
lesion method for estimating tobacco mosaic virus. Contributions Boyce 
Thompson Inst., 6 (1934), 437-454. 


‘ 


al 
ce 


STATISTICAL MODELS BEARING ON THE 


SEMANTICS OF CORRELATION 


I. THE UMPIRE BONUS MODEL 


BY LANCELOT HOGBEN, F.R.S. AND KENNETH W. KEMP 
Department of Medical Statistics, University of Birmingham 


1. INTRODUCTION 


TATISTICIANS familiar with the variety of problems amenable 

to the technique of correlation, if also charged with the respon- 
sibility of teaching the theory to beginners, must have indulged at 
some time or other in either or both of two reflections: (a) that the 
traditional geometrical introduction to linear correlation by least squares 
confronts the student with the dilemma of deciding how regression 
may be linear in one dimension only; (b) that the generality of the 
geometrical approach blurs what is specific about circumstances in which 
correlation between two variates may arise. Some statisticians, notably 
Elderton? (1938) and Rietz? (1920), have taken cognisance of the 
difficulties of the beginner by recourse to statistical models which 
exhibit linear correlation between two variates. Without employing 
geometrical concepts, these authors have severally examined the proper- 
ties of die and urn models by building up the appropriate correlation 
tables and regression equations with results sufficiently suggestive of 
further enquiry; but reliance on a notation which invokes combinatory 
calculus limits the generality of the conclusions established. We shall 
here show that it is possible to bring into focus identities relevant to 
the semantics of correlation theory, if we derive them from assumptions 
at once more general and more easy to visualise. 

It will be the object of this series of communications to clarify 
semantic issues which severally arise in different domains of correlation 
by developing the algebraic properties of different classes of statistical 
models in a notation which makes explicit the specific peculiarities of 


1 Frequency Curves and Correlation (Third Edition), 1938. 
2 Ann. Maths., 21: 306-322, 1920. 


| 
i 
| 
i| 
| 
| 
r 
| 
f 
yf 
l, 
1, 
| 
8, 
0- 
4 
| 
8. 


112 LANCELOT HOGBEN AND KENNETH W. KEMP 


each model and their relevance to situations of practical interest, and 
one which does not postulate particular assumptions concerning the 
score frequency distribution of the parent universes. This is in fact 
possible without invoking any operations other than those which have 
the sanction of elementary logistics, in short the arithmetical artifices 
of addition, subtraction, multiplication and division. 

In conformity with common usage among contemporary writers, we 
use H(z,) to signify the weighted mean value of a score z,. If there- 
fore yq is the proportionate frequency of 2a, the mean of the latter for 
values of a from 0 to n is in this notation: 


E(t) = 


At the outset, it will be convenient to specify three identities, one 
implicit in the build-up of any grid, the other two implicit in the build- 
up of a grid exhibiting equipartition of opportunity for association, 1. e. 
statistical independence, each being easily deducible therefrom without 
recourse to any operations other than the elementary algorithms. With 
respect to any grid with border scores z_ and z with corresponding 
border frequencies y, and y», the condition of independence is defined 
by the application of the product rule to each cell score za» of frequency 
Yar, Yar = Yo- 

(a) The mean values of the score sum and the score difference of 
any two scores x, and 2» are respectively the sum and the difference of 
their mean values, i.e. + 2») = Mg + My. 


(b) If two such scores are independently distributed, the variance 
(mean square deviation) of the distribution of the score sum or of the 
score difference is the sum of the variances of the individual score 
distributions; in our notation V(z_ + 2%) = V,_+ 


(c) If two such scores are independently distributed their co- 
variance is zero, as is thus evident with respect to a grid with border 
column scores z, labelled from 0 to c and border row scores 2» labelled 
from 0 to r: 


Cov (Xa, = E (tq — Ma) — Mp) 


a=c b=r 
= Ya" — Ma) M,) 


or 


CORRELATION MODELS 113 


Without employing operations other than those specified by (a)-(c), 
or deducible therefrom by recourse to the elementary algorithms, we 
shail here develop the properties of a model invoked by Elderton. 
It will not be necessary to introduce geometric concepts at any stage, 
and it will not be necessary to rely on combinatory calculus. We here 
call Elderton’s model the umpire bonus model. As prescribed by the 
author cited, it signifies the following set-up. Each of two players 
A and B respectively toss one and the same die a and b times, their 
respective scores being 2.9 and 2».9. The umpire tosses the die u 
times. Each player records as his total score 2 or Zp, in contradistinction 
to his individual score 24,5 or 2».o, the sum of the latter and the score 
(zu) of the umpire. That is to say: 


La = and a+ (1.1) 


Since the player’s luck by his own devices is independent of that of 
the umpire, the following relations must subsist between the mean 
(M,) and variance (V,) of the score distributions of the umpire, of 
the means (M,.o and M,.,.) and variances (Vg.o and of the 
individual score distributions of the players, and of the means (M, 
and M,) and variances (V,) and (V») of the total score distributions 
of the players: 


Mu + and (1. 2) 
Va = Vat Va.o and Vat Vo. (1. 3) 


Elderton discusses only the particular case consonant with the 
restrictions: (a) that both players and the umpire toss the same die; 
(b) that both players perform the same number of independent tosses 
a=(n—u)=—b. Thus the total number (n) of tosses which the 
players respectively record is the same. In these circumstances fp, 
the product-moment coefficient with respect to the player’s scores, is 
u/n, a result which Elderton illustrates by numerical examples without 
offering a general proof 1: the source cited. The recipe offered for the 
computation involves the evaluation of each cell frequency in the 
correlation table by consideration of compatible permutations. In fact, 
it is possible to adopt a procedure more economical, and without 
imposing on the model either of the restrictions specified above, by 
recourse to an operation, the visualisation of which has proved helpful 
to the senior author as an expository device (a) to clarify the statistical 


4 


en 


114 LANCELOT HOGBEN AND KENNETH W. KEMP 


theory of non-assortative mating; (b) to exhibit the elementary prop- 
erties of the hypergeometric series and non-replacement distribution. 

The staircase diagram of Fig. 1 exhibits a method of exploring the 
numerical properties of an umpire bonus model by the composition of 
unit correlation tables referable to successive values of zy, the umpire’s 
score, appropriately weighted by yy, the corresponding score frequency. 
A card pack, as in Fig. 1, or an urn, as in the treatment of Rietz, is 
equally apprepriate to an approach to correlation without recourse to 
geometric concepts, but can reproduce the properties of a comparable 
die model, only if the player or umpire replace each card (or ball) drawn 
before taking another. In short, Fig. 1 is a replacement model. 

Neither of the two restrictions Elderton imposes, as mentioned in 
the last paragraph but one, is implicit in (1.1). It leaves us free to 
postulate : 


(i) sampling from three universes any two of which may or 
may not be identical ; 


(ii) the numbers (a or b) of individual tosses (or selections) 
of the players may or may not be identical with one another or 
with that (w) of the umpire. 


in such a set-up, correlations of three sorts necessarily arise. Since 
the total scores of the players are subject to a constraint arising from 
the common contribution of the umpire’s score, there must be a positive 
correlation between z, and 2». We here use rq, for the corresponding 
product-moment summarising index. By the same token, each of the 
total scores of the players will be positively correlated with that of the 
umpire; and we shall denote the corresponding product-moment coeffi- 
cients by ray and roy. 

Without loss of generality, we can accommodate negative correlation 
within the framework of our assumptions by modifying (1.1) to take 
into account the result of assigning one player’s total score by sub- 
tracting the umpire’s score from his individual score, viz.: , 


and + Ty. (1. 4) 


It is scarcely necessary to show that positive correlation arises when the 
signs of the two expressions are identical, and negative correlation if 
opposite. In what follows we shall therefore operate within the frame- 
work of (1.1) on the understanding that (1.4) provides a sufficient 


the 


CORRELATION MODELS 115 


basis for extending the validity of any conclusions hereafter derived 
to cover negative correlation as well. At a later stage, we shall remove 
a restriction on the relevance of our model situation to real life by 
dispensing with the prescription that each player receives an equal 
bonus from the umpire. By so doing we exhibit (1.1) as a particular 
case of the most general form of linear concomitant variation defined 
by (4.6) below. No doubt, it would be more aesthetically satisfying 
to reverse the procedure we do in fact adopt by first examining the 
more artificial case defined by (1.1); but the course we have chosen 
to follow should make it easier for the statistician who is not a pro- 
fessional mathematician to appreciate the simplicity of the symbolic 
operations we employ. 


A score 
27 eee 
27 4) 18 eee 108 


Fic. 1. THE Umprre Bonus 


The visual model of Fig. 1 exhibits a pattern which suffices to 
prescribe the computation of all classes of situations dealt with in 
what follows. The umpire draws 2 cards with replacement from a full 
pack. Each player (A and B) draws one card ditto adding to his 
individual (heart) score the heart score of the umpire. The umpire’s 
heart score is 0, 1 or 2 with relative frequencies 9:6:1. The player’s 
individual scores are 0 and 1 in the ratio 3:1, and we may therefore 
set out their contingent individual scores thus: 


A 
0 1 


0|9 3 


1| 3 1 


| 
of 
cy. 
is 
he | 
ble | 
wn 
in 
to | 
18) | | 
ive 
che | 
che 
ffi- | 
ike 
ib- 
he | 
if 
ont | 
| 
q 
| 


116 LANCELOT HOGBEN AND KENNETH W. KEMP 


To get the relative frequency of their total scores in virtue of adding 
the umpire’s score of 0, 1 or 2, we must weigh the corresponding tables 
of totals in the ratio 9: 6:1: 


A A A 

0 a 1 2 2 3 
0} 81 27 1 | 54 18 2/19 3 

B B B 
1 | 27 9 2) 18 6 313 1 

We thus obtain the final table: 
A 

0 1 2 3 

0} 81 27 — — 

1 | 27 9+ 54 18 — 

B 
18 6+ 9 3 
3) — — 3 


2. PRELIMINARY DEFINITIONS AND NOTATION 


Since the object of this communication is to clarify some semantic 
aspects of correlation, we here propose to introduce terms which the 
foregoing definitions of our model situation suffice to elucidate. We 
shall first distinguish between two types of logical relationship of 
which the correlation coefficient is a summarising index. No statistician 
needs to be reminded of the fact that significant correlation between 
two variates z, and 2, may connote either of two relationships: (a) an 
antecedent relation of one to the other, such as we here call consequence; 
(b) their joint dependence on a common antecedent, denoted in this 
context by concurrence. Commonly, the physical sciences deal with 
consequent relations in this sense, for instance the effect of increasing 
the load on the length of a spring. Not uncommonly, the biologist 
and the sociologist seek clues to such relationships by first exploring 
concurrences such as the IQ of brothers and sisters. Perhaps -because 
physics has cast such a long shadow over the quantitative approach to 
biological and sociological problems, a distinction so trite at the verbal 
level receives little if any explicit recognition in the traditional treat- 
ment of regression and correlation by recourse to geometrical reasoning. 
It is therefore a matter of peculiar interest that the model which is 
the theme of the present communication makes explicit this distinction 


ion 


CORRELATION MODELS 117 


in the build-up of the product-moment coefficient and in the derivation 
of the regression equations. 

It might seem more fitting to designate as dependent, relations of 
the kind specified as (a) in the foregoing paragraph. Our choice of a 
new epithet is deliberate. In the universe of mathematical discourse 
dependence has a connotation referable to the convenience of expressing 
a particular variable as a function of another or others. The physicist 
in search of laws within the domain of consequence, as defined above, 
customarily operates within the framework of a convention that the 
dependent variable specifies what is ordinarily the consequent in contra- 
distinction to the antecedent at the operational level. This juxtaposition 
of causality and formal dependence is merely a convention; but no 
ambiguity arises in experimental science from the fact that mathematical 
dependence has no logical connexion with the dichotomy under discussion. 
In sociology and biology, this is not so. It is often impossible to know 
whether a relationship under discussion is consequent or concurrent. 
Even if we have reason to suspect that the former holds good, it may 
be difficult to decide which variable is consequent and which variable 
is antecendent in the operational sense. To carry over the verbal con- 
ventions of physics into the domain of biometry or social statistics 
therefore exacts the penalty of perpetual vigilance. Indeed, few familiar 
with past controversies concerning the uses of correlation and regression 
in the field of heredity and of political economy would wish to deny the 
danger of identifying a convention of formal dependence with the 
assumption of causality. 

In virtue of (1.1) the relation of the two scores rq and 2» is a 
concurrent relation; but that of z, to z, or of 2» to zy is a consequent 
relation in the sense defined. Accordingly, we shall see that the joint 
relation of rap to Tay and ry, makes manifest a formal criterion by 
which we may distinguish the one from the other. 

We shall later show that the properties of the umpire bonus model 
admit of situations in which there is linear regression in both dimensions, 
linear regression in one dimension only and linear regression in neither. 
To sidestep confusion with respect to such situations, in what follows 
we shall adopt the following subscript conventions: 


(a) Myq for the mean value of 2, associated with a particular value 
of z,, and M,, for the mean value of z, associated with a particular 
value of x», the definition of Mua, Mau, Mou, Mu» being consistent there- 
with ; 


i 
ig | 
es 
| 
tic 
he 
Ve 
of 
an 
en 
an 
11s 
ith 
ng 
ist 
ng 
ise 
to 
yal 
at- 
1g. 
i | 
= 


118 LANCELOT HOGBEN AND KENNETH W. KEMP 


(b) for the variance of the B-score means corresponding to successive 
values of 24, we write V(M,,), and V(M,») for the variance of the A- 
score means associated with particular values of 2»; 


(c) for the mean variance of the B-score distributions corresponding 
to particular values of z,, we write M(V>,,.), and for the mean variance 
of the A-score distributions corresponding to particular values of 2», 
we write M( Va») ; 


(d) for the covariances, we write: Cov(rar»); Cov(xary); Cov(xpry). 
In this notation, evidently: 


= Cov( La. —= Cov(Ly.otu) = 0; (2.1) 


(e) to define the equations of regression we also require a fixed 
subscript convention, and shall emply ky. for the regression coefficient 
of the score of player B on that of player A. Accordingly, the equation 
of linear regression of the score of player B on that of player A takes 


the form: 
Moa — My = koa ( — (2. 2) 


Likewise, the equation of linear regression of A on B takes the form: 
Mav — Ma = — (2. 3) 


At a later stage, it will be our task to examine in what circumstances 
either or neither of these equations holds good. Here we may anticipate 
the conclusion that a model consonant with the system of scoring 
defined by (1.1) may in fact behave in a way consistent with neither 
(2.2) nor (2.3). Consequently, it will be useful to draw a distinction 
between linear regression and linear correlation. The following con- 
siderations justify the use of the latter expression with respect to any 
system defined by (1.1). As we increase u, the number of tosses of 
the umpire relative to a and to b, we necessarily increase the propor- 
tionate contribution of the umpire’s bonus (z,) to the player’s total 
score (Z_ or 2»), and the contour of the scatter diagram for the total 
scores of the players approximates more and more closely to a straight 
line. In naturalistic terms, we may think of the total scores of the 
players as two measurements subject to a common system of constraints 
with freedom to vary independently in virtue of circumstances with 
effects peculiar to each. If the withdrawal of this individual freedom 
by experimental control imposes on their joint variation a linear relation- 


CORRELATION MODELS 119 


ship, it is surely appropriate to say that they are subject to a linear law 
of concomitant variation. 

Such is the relationship we here denote by linear correlation, and 
such is the relation implicit in the scoring system of the umpire bonus 
model as defined in (1.1). We might, of course, impose on our model 
a law of curvilinear correlation by allocating to one player the umpire’s 
raw score and to the other player the square of the umpire’s raw score, 
so that + and + but we shall not examine 
the properties of such a model in this context. It suffices to say that 
the relation of zx, to x», then approaches more and more nearly to a 
quadratic form as we restrict their freedom to vary without reference 
to the concomitant source, e.g. by increasing the relative number of 
tosses the umpire performs. Linear correlation as defined above is the 
law of the umpire bonus model prescribed by Elderton with or without 
restriction concerning the nature of the die or the number of independent 
tosses allocated to the players; and it is our submission that (1.1) 
formally defines a situation of which we must predicate a linear law 
of concomitant variation in any meaningful sense of the term. That 
it is necessary to state this view with emphasis will be evident from a 
statement of Snedecor,® expressing a view widely held: “the discovery 
of a precise description of the concomitant variation of two or more 
quantities is one of the problems of curve fitting known as curvilinear 
regression.” The relevance of this to what follows is implicit in a 
preceding assertion, which we shall justify later. If linear regression 
in one dimension only, or in neither, is consistent with its properties, 
our model may in fact serve to bring into focus how much we can really 
learn about the nature of concomitant variation from the existence or 
non-existence of linear regression. 

Since we here use the term linear in deference to established usage, 
it is appropriate to emphasise that we are not introducing a geometrical 
concept by the back door. For the nature of the model implies that our 
score values are discrete and excludes the postulate that the score dis- 
tribution is continuous. All that we can signify by the distinction 
between linear and non-linear regression of the score of player B on the 
score of player A is whether Myq does or does not increase by equal 
increments with respect to successive values of 24 which likewise increase 
by equal increments.‘ 


® Statistical Methods (Fourth Edition), 1946, p. 374. 
‘This assumption implies the form of the regression equation specified by 


| 

e 

| 

bs | 
| 

). 

| 

d 

it ‘ 

n 

ag 

) 

Pg 

te 1 

ig 4 

or 

mn 

ly 

of 

r- 

al 

al } 

nt 

1e 

ts | 4 

th 


120 LANCELOT HOGBEN AND KENNETH W. KEMP 


These preliminary considerations invite comment with respect to the 
particular interpretation we impose on the correlation ratios with respect 
to the scores of players A and B: 


nea” = V(Mora)/V» and V (Mav) /Va = nar”. 


In our notation, the corresponding correlation ratios for the joint 
variation of the total score of player and umpire are: 


V(Mou)/Vo and V(Mys)/Vu, 
V(Mou)/V> and V(Mu)/Vu. 


It is widely current usage to regard a ratio of this form as definitive 
of explained variation when regression is linear; and it is therefore 
appropriate to be explicit about what meaning we attach to explanation 
in this context. Searching semantic reexamination of the theory of 
correlation must at the outset take cognisance of the fact that variance 
is not in fact a unique measure of variation on any grounds other than 
its mathematical convenience; but we need not relinquish the advan- 
tages of this convenience, if we clearly recognise what is arbitrary in 
customary usage. At this stage, we shall not anticipate the results of 
subsequent examination of circumstances in which we can legitimately 
identify the proportionate contribution of the variance of the distribu- 
tion of the column (or row) means to the variance of the distribution 
of the column (or row) border scores of a correlation table with the 
fraction of the latter explicable in virtue of an agency common to the 
relevant variates. We are here content to define two quantities C,(2q, 2») 
and (C,(2a, Z») respectively referable to that fraction of the total variance 
of z, and 2» identifiable with the variance of the distribution of their 
common component. The elementary definition of these components of 
concomitant variation follows from the breakdown of total score variance 
in (1.3). The latter exhibits the variance of the total score distribu- 
tion of the player as the sum of two components, one being the inde- 


(2.2) and (2.3) above. Given that successive values of M a. for successive values 
ot ag increasing or decreasing by equal increments constitute an A. P., we may 
write: 


= 
E(M,,) =k,,E(@,) 
—M, = — M,)- 


CORRELATION MODELS 121 


pendent variance of the player’s individual score distribution, the other 
being the concomitant variance of the umpire’s score distribution. 
In conformity with the definition of our coefficients of concomitant 
variation : 
Ca(2a, Zp) = V./ Vo and Zp) = Vu/V>. (2. 4) 


From (1.3) we see that the fraction of variance of the distribution 
of Zq or 2» attributable in this sense to a common component is capable 
of alternative expression in so far as V, is the common component of 
either V, and V, or V, and V». When we turn our attention from the 
concurrence of zq with respect to 2» to the consequence of 2q or 2» with 
respect to zy, we may in fact write with equal propriety by recourse to 
the same notation: 

Ca(a; Lu) = Vu/Va= Ca(2a, Zp), (2. 5) 

Cy (2, Lu) = Vu/Vo Co(2a, Zp). (2. 6) 
It goes without saying that this symbolism makes explicit the relation: 

Cu(a; Lu) Vu/Vu 1 Cu(2, Zu). 


Since we do not employ geometrical concepts in our treatment of the 
properties of the umpire bonus model, it may be of use to the reader to 
set out the vertical-horizontal subscript conventions we shall employ 
by reference to a grid as in Table 1. 


TABLE 1 
Fundamental correlation grid 

BORDER Row Row 
SCORES @,=2 ... @, FREQUENCY MEAN 
=0 Yoo Mao 
=1 Yo. M,, 
= Yoo M,, 
M,, 
Yor M,, 
CoLUMN 

FREQUENCY Yao Yar Yao Yaj Vac 
CoLUMN 

MEAN M,, M,, M,, M,,; M,, 


| 
| 

i 

| 
if 
| 

3 
y 


122 LANCELOT HOGBEN AND KENNETH W. KEMP 


In what follows we use E(2,,) to signify the mean value of a score 
function rq, of rq and 2» (e.g. the product z,-z,) with respect to all 
cells of the grid. If the latter has consecutive column border scores 24 
labelled 0 —c and consecutive row border scores x, labelled 0 —r: 


a=c b=r 


E(...)= yar(.--). 


It will be more easy to exhibit as tautologies of a grid certain relations 
we shall later employ, if we introduce subscripts to distinguish this 
operation from that of: 


(a) taking the mean value in the A-dimension (column by column) 
of the grid of a score (e.g. the column variance) with frequency y,; 
definitive of a column, or in the B-dimension (row by row) of a score 
(e.g. the row mean) with frequency y»; definitive of a row: 


Ba...) == and Bs(...) = 


(b) taking the mean value of a score within the j-th column or 
within the i-th row: 


Yo a=0 


1 b=r 1 a=c 


In this notation: 


Eq: = E(...) = Ear(.--) (2. 7) 


If wu, has a single value for any value of z, and is a function of 24 
alone, v» being likewise a single-valued function of z, alone, we may 


write: 
= My = E(uq) and = My = (2. 8) 


In the domain of the operation E,», vy» remains constant as does wu, in 
the domain of the operation Ey., so that 

Vo) = Ua = Mra, 

Vo) = an(Ua) = Vo 


E (tg: Vy) = Eq: Vo) = Ea(tta* Mea), 
and 


E (Ug: Vy) = Vy) = Ey(v»* Mur). 


CORRELATION MODELS 123 


In particular, when ug = and vp = 2, 
Moa) = E(ta* 2») = Man), 
Mya) = Cov(aq, ty) + MaMy = Man). (2.9) 
In this notation we write for the column and row variances: 
Voa = Moa)*? = — Moa? 
Vav = — Mav)? = Ean(Xa?) — Mar’. 
For their mean values we then write explicitly: 
M(Voa) = Ea( Vea) = E(x?) — Mra?) (2. 10) 
M(Vav) = Ev( = E (2a?) — Eo ( Mar’). (2. 11) 


With the same convention the total variances of the two sets of border 
scores are respectively 


Vy = E(2,?) — M,? and V, = E(2,”) — M,’. (2. 12) 

For the variance of the column means and of the row means we have 

V (Moa) = Ea( Moa?) — My? and V( Mav) = Ey( Mav?) — Ma’. (2. 18) 
By combining (2.13) with (2.10) or (2.11) we thus obtain 

M(Voa) + V (Mr) = —M,? = (2. 14) 

M(Vav) + V(Mav) = E (2a?) — Ma? = Va. (2. 15) 

One to one correspondence, i.e. perfect correlation, of zg and 2» 

signifies that =O—M(Va»), whence by definition 

= nav’. Also statistical independence of x and x» implies that yan = Ya Yo 

whence all values of Myq and all values of M,, are identical, so that 


V (Moa) = 0 = V(May), in which case = 0 = mya”. When regression 
of z» on 2, is linear, we may write: 


La* Moa = koa * La” — La + My: 2a, 
Eq(ta* Moa) = — + My- Ea(%a) 
= — + 
Va + M,M,. 


Whence in virtue of (2.9) above 
Cov(a, 2») = kraVa. (2. 16) 


3 
3 
| 
| 
| 
1 
y | 
a 


124 LANCELOT HOGBEN AND KENNETH W. KEMP 


Likewise, when regression of 24 on 2» is linear: 
Cov(2a, 2») = kay V >». 
When there is linear regression of 2) on 2, we may write: 
Moa? = [koa(%a— Ma) + Mp]? 
= kya? (La — Ma)? + (ra — Ma) + M,? 
Ea( Moa?) = kya? E — Ma)? + Ma) + My? 
= + M,?. 
Ba( Moa?) — My? = V (Moa) = koa? Va 
V(Moa)/Vo = = (2.17) 

If ra» is the product-moment index with respect to z, and 2», in virtue 


of (2.16), 
fap? = Cov? (Xa, 2y)/VaVn = Va/V>. 


Whence by (2.17) linear regression of x) on x, implies 
Noa” (2. 18) 


By the same token, linear regression of x, on Z» implies: 


nav” = V(Mav)/Va = rar’. (2. 19) 
It follows from (2.16) that 
kya? = Tan? (V0/Va) 
koa = Tan (or/oa). (2. 20) 


Likewise, we may write 
kav == Tar (oa/on) 


That (2.18) and (2.19) define a sufficient as well as a necessary 
condition of linear regression in one or other dimension of the grid 
is also implicit in the relation specified by (2.16), in which we may 
write for brevity Cov(2,, — Ca», so that: 


= Can/ Va. 
Hence linear regression of 2, on 2, signifies that: 


(Moa —M,) = Cav(Za — Ma) /Va 
(Moa — Mo) Cav(®a — Ma) /Va == (). 


| 
| 
| 


CORRELATION MODELS 125 


If we denote the expression on the left by D, linear regression of 2» 
on 2, signifies that D — 0 for all values of 7, as must be true if its mean 
(Ma) and variance (Vq) are each zero. That the mean is zero is evi- 


dent, since: 
Ea(Mva— Mp) 0 Ma). 


For the variance of D we have 


Ve— Ea(D*) —[Ea(D)]?—= Ee(D*) 
— (Moa — Mr) — 


—V(Mve) —2 ( Mie — Mh) (zo — Ma) + 


In the above we may write: 
Ea(Moa— Mp») (ta-— Mo) Ea(Moa* — MaEa(Moa)— My- Ea(ta)+ 
= = Cop. 
Va = V(Moc) — (Cav?/Va)- (2. 21) 
The condition defined by (2.18) implies that 


V (Mse)/V» = Cas?/VaV», 
V(Mve) —Cor?/Vo = 0. 


In virtue of (2.21) the identity defined by (2.18) therefore signifies 
that Vz— 0. Since the value of Mz is also zero, this means that D —0 
and regression of 2, on 2, is therefore linear when mq? = fap’. 

The sufficient and necessary condition of linear regression of 2 
on z» is deducible in the same way. The reader will notice that the 
y notation we here employ exhibits the relations between the product- 
moment index, the regression coefficients and the correlation ratio as 
tautologies of a grid structure without recourse to geometrical reasoning, 
to the logically gratuitous postulate of a continuum, or to such specialised 
postulates as are implicit in the method of least squares and the bivariate 
normal surface. In virtue of (2.14) and (2.18) linear regression in 
either dimension of the grid signifies that the possible range of rg» is 
+1; but we shall later see that this, though a sufficient condition for 


— 


126 LANCELOT HOGBEN AND KENNETH W. KEMP 


defining as such the exact limits of the range of the product-moment 
index, is not in fact a necessary one. 


3. NUMERICAL ILLUSTRATIONS OF THE UMPIRE BONUS MODEL 


Though our first concern is to clarify semantic aspects of correlation 
by exploring the algebraic properties of the model which is the theme 
of this communication, it will not be unprofitable to emphasise its 
heuristic value by citing numerical examples to illustrate the result 
of removing the restrictions which Elderton imposes on his exposition 
of its pecularities. Neglect of its merits as a heuristic device is perhaps 
attributable to the cumbersome notation which Elderton himself employs 
to construct the correlation grid appropriate to a particular prescription 
with respect to the nature of the die and the number of tosses assigned 
to umpire or players. The ease with which it is possible to do so by 
the staircase operation places at the disposal of the student not as yet 
conversant with the theory of correlation the means of generating an 
endless variety of self-made examples involving trivial arithmetical 
labour, and hence of obtaining a broad preview to the algebraic treat- 
ment of the problem without threading a maze of laborious computa- 
tions in the first stages of unfamiliarity with the appropriate operations. 

Since we shall not ourselves impose on our treatment of the problem 
any restriction concerning the nature of the die, it is pertinent to call 
attention to the theoretical possibility of constructing an almost limitless 
range of unbiased dice by shearing the surface of a sphere and allocating 
consecutive scores to the resulting faces. For our present purpose, it 
will suffice to employ three classes: 


(a) cubical, here signifying the common die of craps with consecu- 
tive scores of 1-6 on its several faces, so that the unit sampling (i.e. 
single toss) distribution is that of a 6-class rectangular universe defined 


by the multinomial (}+$+4+4+4+4)'; 


(b) tetrahedral, admitting four possibilities if the scores on the 
faces are consecutive, viz., : 

(i) a rectangular 4-class universe whose unit sampling distribu- 
tion conforms to the multinomial (4 + 4+ 4+ 4)!; 

(ii) a 3-class binomial universe as when one face carries one 
pip, one face three pips and the two remaining faces each carry 
2 pips, so that the unit sampling distribution conforms to the 
terms of the expansion (4+ 4)’; 


nt 


CORRELATION MODELS 127 


(iii) a 2-class binomial universe specified by (++ })? when 
3 faces have m pips, the remaining face having (m + 1); 


(iv) a 2-class universe which is both binomial and rectangular 
in conformity with the definitive formula (4 + 4)*, when 2 faces 
have m and the other 2 faces have (m + 1) pips; 


(c) discoidal, like a penny with the same formal properties as (iv) 
of (b) above. 


For each illustrative example below we cite identities or inequalities 
relevant to the prescribed conditions in accordance with the general 
treatment of §4 and §5 below. With one exception the symbols 
employed are as already defined in §1 and § 2 above. 


Example 1 


Umpire and players A and B toss the same die, which is a flat dise having 
0 pips on one face and 1 pip on the other. The tosses of A, B, and the umpire 
are subject to the following conditions: 
(a) the umpire tosses twice, 
(b) player A tosses once, 
(c) player B tosses twice. 
The numerical results relevant to what follows are: 
(i) =%=Cov(a2,2,) = Cov(a,2,) = 
(ii) M,=1 ¥,=% 3 M,=2 
(iii) V,=% V.=% ; V,=1 
(iv) =*%=V,/V, ; 
= 
(v) V(M,,) =% ; 
(vi) 
(vii) ko =%=(V,/V.) + ky=%=(V,/7,)- 


Example 2 


(a) The umpire tosses twice a tetrahedral die with 1 pip on one face, 3 pips 
on another and 2 pips on each of the remaining pair. 

(b) Player A tosses once a tetrahedral die of which one face carries 1 pip 
and the other three faces 2 pips. 

(c) Player B tosses twice a flat circular die with 1 pip on one face and 2 
on the other. 
In this example the three dies are different, but in the symbolism of § 5 below, 


on 
ne 
lt 
on 
ps 
ys 
on 
ed 
by 
‘et 
an 
al 
it- 
a- 
18. 
m 
all 
ng 
it 
u- 
ed 
e 
u- 
ne 
ry 
he 
Py Pp b | 
| 


128 LANCELOT HOGBEN AND KENNETH W. KEMP 


The numerical results relevant to what follows are: 
(i) V,=1= Cov(#,2,) = Cov(a#,a,) = Cov(a,2,) 
(ii) ; M,= 2% M,=7 
(iii) V =1 ; = 
(iv) = 1% = V,/V, ; = %*=V,/V, ; 
(v) =% + 
(vi) V(M,,) = *%001 V = Tap” 
(vii) k= %=—V,/V,. 
Example 3 
Umpire and players A and B toss the same die which is a flat disc with 1 pip 
on one face and 2 pips on the other. 
(a) The umpire tosses twice, 
(b) player A tosses twice, 
(c) player B tosses three times. 
A’s score is his own score and three times that of the umpire. B’s score is his 
own and twice that of the umpire. In this example | ~ 1 ~ m, in accordance 
with (4.6). The numerical results relevant to theorems established below are: 
(i) Im.V, = 3 = Cov(a,2,) W, = %=Cov(2,2,) 
m.V, =] = Cov(a,2,) 
(ii) M,=3 ; M,=12 ; M, = 105 
(iii) Vi=% +; ; 
(vi) V(M,,) =2 ; V(M,,)/V, = 


4. THE FUNDAMENTAL ALGEBRAIC PROPERTY OF THE 
UMPIRE BONUS MODEL 


Our numerical illustrations bring into focus what is the most 
general property of the model under consideration. In all circum- 
stances defined by (1.1), the covariance of the two score distributions 
is the variance of the distribution of the umpire’s score, i.e. of the 
source of concomitant variation. The proof requires no explicit specifi- 
cation of the properties of the die, if we proceed as follows; 


Cov(%a, Ly») = — 
E (tq: = E (Xu + (tu + 2.0) 
= E(2,?) + + E .0) + E(2a.o%v.0)- 
(Mu + Ma.o) (Mu + 
= My? + MuMe.o + MuMy.o+ Ma.oMs.0. 


ip 


1is 


CORRELATION MODELS 129 


Cov(2a; Lp) E (au?) — M,? + E (2 La.o) —MuMa.o 
-f- — MuMy.o+ E (2a. 0%v.0) — 
= Vu + Cov(tu, La.0) + Cov(ru, + 


Whence from (2.1): 
Vu/VVa- Vy (4.1) 
If we put = 0 in the above, we obtain: 
Cov(2%a, tu) = Vu, 
feu = Vu/V Vu V (4. 2) 
By the same token: 


Tou = V Vu/V>. (4. 3) 
In accordance with (4.1)-(4.3) we may also write: 
Tar = Tou Tou (4. 4) 


Without regard to circumstances which impose non-linear regression 
in either dimension of the grid as illustrated by numerical examples 
in §3 above, the maximum value of rg is unity. This is evident if 
we write (4.4) in the form: 


1 Va V, Vat Vat V5.0 


Va Va Ve Ve 
Va.o 
—(14 (1+ (4. 5) 

In the limiting case which arises when we remove all freedom of 24 
and z, to vary independently, 23.9 —0O—2).5 in (4.5) above and 
Ta» = 1. In the limiting case, when there is no source of concomitant 
variation, 7, = 0 and With due regard to preceding remarks 
upon (1.4) above, the product-moment coefficient rg, has therefore 
the essential summarising properties of a linear correlation coefficient 
without restriction with respect to linearity of regression. 

What is of equal interest, emerges from comparison of the form of 
(4.2) and (4.3) with that of (4.1). Within the framework of our 
basic assumption (1.1), we have a necessary and sufficient condition 
which permits us to discriminate between a relation of consequence 
and. one of concurrence. Consequent relationship of a score 4% with 


130 LANCELOT HOGBEN AND KENNETH W. KEMP 


respect to z; here signifies that the square of the product-moment 
coefficient rj; is the ratio V;/V;. Unhappily this formal criterion is of 
very restricted usefulness for a reason which emerges when we remove 
a restriction hitherto imposed on the rules of the game. Although 
(4.3) defines a sufficient, it does not define a necessary condition of a 
consequent, in contradistinction to a concurrent, linear relationship in 
its mest general form. 

We have hitherto confined ourselves strictly to the specification of 
a law of concomitant variation which is a particular variant of a more 
general linear pattern numerically, illustrated by Example 3 of § 3 above: 


Lq = + Ly = + (4. 6) 


In this case, trivial modification of the preceding reasoning leads to 
the relations : 


Cov(Xa, 2») = ImVy, tu) ; tu) = mVy. (4.7) 


In view of previous comment on the heuristic interest of the umpire 
bonus model it will not be out of place to draw attention to the content 
of (4.1)-(4.3) as a teaching device. Statisticians who have experienced 
the difficulties of presenting correlation to beginners will note that 
their derivation by quite elementary methods exhibits covariance within 
the framework of our present assumptions as a measure of concomitant 
variation far more explicitly than is possible by reliance on geometrical 
representation; and every step in the argument derives its rationale 
from elementary numerical properties of a grid as easy to visualise as 
a graph. 

It is evident that (4.1) reduces to rg, = u/V (u+ a)(u+ 6) when 
the two players and the umpire employ one and the same die.’ The 
original set-up prescribed by Elderton involves the additional limitation 
that a— (n—w)=—b. In that case, (4.1) reduces to Elderton 
formula cited above, i. Tay = u/n. 


5 Regardless of the nature of the die, i.e., the frequency distribution of the 
system of scores, the variance (V,) of the unit sample (single toss) score dis- 
tribution and that (V,) of the score sum distribution with respect to an n-fold 
replacement sample (n independent tosses) is given by V,=n. V. Thus 
(u-+a)V, and V,=(u+5)V,. Hence 


uV(u+a)(u+ b). 


CORRELATION MODELS 131 


5. THE CONDITION OF LINEAR REGRESSION 


To establish a necessary condition of linear regression consonant 
with (2.2) or (2.3) and (1.1), we must take into account the 
minimum score values. We do not need to assume initially that the 
scores on the faces of the three dice assigned to the umpire and to 
the players (A or B) are consecutive, nor to place any restriction on 
the numerical value of the lowest score on the face of a die. We here 
denote the lowest face-scores respectively as 1,4, J, and ly. Evidently 
the minimal score with respect to the umpire’s u-fold toss is u-1,, the 
lowest total score of player A is (a-1,-+-u-l,) and that of player B 
is (b-1,-+u-l,). Evidently, likewise, the minimal value of My, is 
M,.o+ uly. We now transform (2.2) as follows: 


Ma — ta) = My — Moa = Mu + Mo.o — Mra, 
= (My + Mo.o— Moa) /(Ma— 2a). 
The mean value of 2» associated with the minimal value of 
being (M,.o+u-l,) as stated above: 
= (My — u- lu) ly) (5.1) 
If regression is linear in the alternate dimension of the grid: 
kan = (My — — — (5. 2) 


In virtue of (4.1) and (2.8), linear regression in one or other dimen- 
sion of the grid signifies that: 


kva and ke (5. 3) 


In order that regression may be linear in the appropriate dimension, 
(5.1) or (5.2) must conform to the requirements of (5.3), 4. e. 
V./V.= (M,—u- l,—u- ly) 
Vu/Vo = (My — — — ly) (5. 4) 
These equations define a necessary condition that regression is linear 
in accordance with (1.1). By recourse to the same considerations, we 
obtain the corresponding condition with respect to linearity of regression 
in accordance with the more general law of linear concomitant variation 
specified by (4.6): 
Vu/Va = (My and 
Vu/Vo= (Mu m-u-l,). (5.5) 


| 
f q 
0 ! 
it 
d 
it 
n 
al >| 
le 
As 
1€ ] 
n 
m 
he ; 
is- j 
i 
| 
| 


132. LANCELOT HOGBEN AND KENNETH W. KEMP 


Subject to the same assumption, i.e. (4.6), the consequent relationship 
of the score of the player on that of the umpire implies that: 


Kau = (Ma.o+1-Mu— Mau)/(Mu—u) and 
Kou = (My.o + Mu — Mou) / (Mu — (5. 6) 


Since Cov(2q, =1- Vy and m- Vy, linear regression 
of the player’s score on that of the umpire also implies that 


kou=1 and kyy =m. (5.7) 


The minimal value of x, associated with a mean A-score May = (M,., 
+1-u-l,) in (5.6) is u-1l, and the mean B-score associated therewith 
is My = By substitution in (5.6) we therefore 
obtain in agreement with (5.7) 


bes == l and kou = m, 


Thus the necessary condition of linear relationship corresponding to 
(5.5) always holds good with respect to the consequent relationship 
of the player’s score on that of the umpire in contradistinction to the 
concurrent relationship between the player’s scores; but neither (5. 4) 
nor (5.5) is a sufficient condition of linear regression in the domain 
of concurrence. For linear regression of the B score on that of A 
implies that there is a unique value of M),, associated with every discrete 
value of z,; and this imposes a restriction on the scalar relations of 
and 

To exhibit this restriction, we shall assume that 2z,.. increases by 
multiples of an increment d,, and z, by multiples of an increment dy. 
For the general case of linear concomitant variation defined by (4. 6), 
it is evident that the successor of the minimum value of z,, which we 
may denote by a) = (a-1,-+7-u-1,), may have either of the following 
values in the absence of any restriction on the values of d, and d,: 


(i) (ii) 


Now the value of Myg—=(M,..-+ m-u-l,) associated with (5.1) is 
the same as the value of M,, associated with a, as defined above. Hence 
the value of M,, associated with z,—a,, the successor of rg = dp, will 
differ from the value of My associated with the latter if, and only if: 


dy S dy. (5. 8) 


Evidently therefore linear regression of the score of player B on that 


Lip 


CORRELATION MODELS 133 


of player A is consistent with the same scale of increase or decrease 
with respect to the score of player A and that of the umpire only if 1 = 1. 

As regards the consequent regression of the score of the player on 
that of the umpire, we note that the minimal value of z,, which 
we may label as u—u-l,y, is associated with a mean A-score 
Mau = (Ma.o+J1-ulu), and that its successor is unique being 
(u-lu-+- ku). Hence no such restriction as the foregoing arises on 
account of the scalar relations of z4.9 and z,; and we have already 
seen that no restriction arises in virtue of their distributions from the 
condition specified by (5.6). On the other hand (5.4) and (5.5) 
impose in the domain of concurrence a restriction on linear regression 
arising from the distribution of the scores z4.5 and z, (with respect to 
regression of B on A) or 2» and zy (with respect to regression of A 
on B). Our numerical examples of § 3 above illustrate circumstances 
in which (5.4) and (5.5) do and do not hold good. 

Subject to the scalar restriction defined by (5.8), we may first 
notice that (5.4) holds good if the players and the umpire toss the 
same die. In that case, we may write 1, —1,, and in virtue of the law 
of addition of means and variance: 


Vu/Vamu/(u+a) = My/My. 
Hence in (5.4): 


(Mu—u-ly)/[Ma— (u+ a)lu] =u/(u + a) = Vu/Va. 


If the dice are not the same, (5.4) is not necessarily an identity 
and linear concomitant variation in accordance with (1.1) is consistent 
with linear regression in one dimension only or in neither. Consider 
for instance the following situation: 


(i) the umpire tosses a sheared spherical die with (s+ 1) con- 
secutive scores increasing by unit increment with frequencies defined 
by successive terms of the expansion of (pu + qu)*; 

(ii) player A has a die with (¢-+ 1) scores with frequencies defined 
by (Pa + qa)*3 

(iii) the player A records as his total score the sum of the score 
of the umpire and his individual score in accordance with (1.1). 


We may therefore write: 


Vu=uspu(1—pu) ; Va uspu(1— pu) + atpa(1— pa) 
My—uspy+u-l, uspy + uly. 


H 
6) | 
on 
7) 
| 
th 
re 
to 
‘ip 
he } 
t) 
in 
| 
te 
of 
by 
ly. 
we 
ng 
4 
is 
ce | 
ill 
f: 
| 


134 LANCELOT HOGBEN AND KENNETH W. KEMP 


The definitive equation (5.8) of linearity for the B-score on the A-score 
then becomes : 


atpa(1 — pa) /uspu(1— pu) = atpa/uspu. (5. 9) 


This equation (5.°) will be true if, and only if: py— pg. A numerical 
example cited in § 3 illustrates circumstances in which it is true when 
the dice are not identical. It will be linear in neither dimension in a 
set-up such as the following: 


(a) the umpire tosses a cubical die; 


(b) player A tosses twice a tetrahedral with 1 pip on one face and 
2 pips on each of the 3 remaining faces; 


(c) player B tosses four times a tetrahedral die with 3 pips on one 
face, 5 pips on another and 4 pips on each of the remaining faces. 


If we substitute for the cubical die of the umpire a flat disc with 2 pips 
on one face and 3 on the other so that py = 4 = p» there will be linear 
regression of the A-score on the B but non-linear regression in the 
alternate dimension. 

The special interest which attaches to (5.4) turns on a semantic 
issue raised in §2. If it is appropriate to regard (1.1) as an equation 
definitive of a linear law of concomitant variation, we have to conclude 
that a linear law of concomitant variation is entirely consistent with 
situations in which neither of the identities definitive of linear regression 
in one or other dimension holds good. In short, we may say that: 


(a) linear regression of the consequent on the antecedent variate 
holds good without restriction on the scalar relations of the scores or 
on their distributions ; 

(b) within the framework of our present assumptions which exclude 
the postulate of continuity, linear regression of one concurrent variate 
en another implies special restriction both on the scalar relations and 
distributions of the scores, and will rarely in fact occur when the law 
of concomitant variation itself is strictly linear. 


6. THE PARTITION OF VARIANCE 


In this context we employ the word partition in preference to analysis 
to avoid ambiguity with respect to different uses of ratios based upon 
(2. 14) and (2.15), namely: (a) to test the null hypothesis that the grid 
is homogeneous ; (b) to break down the variance of the distribution of the 
scores as a whole into components attributable to specifiable sources in a 


sis 
on 
‘id 
he 


CORRELATION MODELS 135 


heterogeneous matrix. Our preliminary 1emarks have defined by (2. 5) 
and (2.6) what we regard as a justifiable partition of variance attribu- 
table to the components of the system defined by (1.1). Within the 
framework of that restriction, it will now be convenient to distinguish 
the partition of the player’s variance separately in the domain of con- 

sequent and concurrent relationship. From (4.2), 


Ca(e, tu) = Vu/Vea = Tau’. (6. 1) 


Regression of the player’s score on that of the umpire is always linear, 
subject to the relation ka, and hence from (2. 11): 


v= V (Ma). (6. 2) 
Whence we can define a component of consequent relationship: 
tu) = V(Mau)/Va- (6. 3) 


In accordance with a universal numerical property of any grid we may 
also write in conformity with our own notation: 


V(Mau) + M(Vau) = Va, 


V(Max) , M(Vau) 
1. 


(6. 4) 


Within the domain of consequence, we can therefore define a residual 
component of freedom: 


Ra(2a; Tu) = M(Vau)/Va- (6. 5) 


It would be truly remarkable if a numerical property of any grid 
whatsoever, and one deducible without recourse to any statistical 
assumptions, proved in all circumstances to have such a unique signi- 
fication as has (6.4) vis d vis (6.3) and (6.5). There is therefore 
no occasion for dismay, if we find no comparable relation valid in the 
domain of concurrence. In accordance with (2.5), we have then to 
write for the component of the variance of A’s score attributable to a 
source common to that of B: 


Last») = Vu/Va = (6.6) 


In the limit, when we remove all independent variability of the player’s 
scores by imposing the condition %.5—0—2».o, it would of course 
be true to write: 

Ca(2a, Lp) = = V(Mau)/Va- 


| 

re 

9) 

cal | 

en 

ql 

| 

nd 

ne 

ips 
he 

tic 

on 

de 

ith 

on 

ite 

or 

ide j 

ite 

nd >| 

aw 

| 


136 LANCELOT HOGBEN AND KENNETH W. KEMP 


This is true in the limit, because the following identities then hold 
good : 
V (Mau) Va Ta = 1 = Tau M(Vau) 0. 


Also in the limit, when we remove the source of concomitant variation 
by imposing the condition z, —0: 


Ra(La, To) = M(Vau)/Va. 
Again this is true only because: 


Except at the limits specified, r., will not be equal to ray, and except 
at these limits—in which case the identity is trivial—r,,? is not a 
correct measure of explained variation. The square of the product- 
moment coefficient with respect to 2 scores subject to linear concomitant 
variation has this meaning if, and only if, the relationship involved is 
consequent in our sense of the term. It goes without saying that we 
can write (6.6) above in virtue of (4.1): 


Xp) Tav(or/oa)- | (6. 7) 


When regression is linear, the regression coefficient of the score of B 
on A, which is also the regression coefficient of the umpire’s score on 
that of A, is therefore the proportionate component of concomitant 
variation as we define it in §1; and the existence of linear regression 
is in fact irrelevant to the formal identity of (6.7). This result inflicts 
no injury on common sense, if we reflect upon the fact that the total 
score A shares with that of B is precisely what the total score of A 
shares with that of the umpire. 

For the more general case of linear concomitant variation defined 
by (4.6): 

Va.o and + Vo.0, 


Ca(Ze, Tu) = Ca(Za, Zr), (6. 8) 


Zu) = = Co (La, 
From (4.8): 


Se 


Tos. 


CORRELATION MODELS 137 


In a purely consequent relationship (6.1) is therefore valid for the 
general case ; but no formal relationship covers the case of concurrence, 
unless we impose on a system of linear concomitant variation the 
restriction | 


PARTIAL CORRELATION 


The outcome of §5 above was to emphasise the very exiguous 
relevance of linear regression to a linear law of concomitant variation. 
We shall now show that the fundamental theorem of partial correlation 
is valid in all circumstances consistent with a linear law of concomitant 
variation as prescribed by (4.6) without invoking the postulate of 
linear regression with respect to variates involved; but it will best 
bring into focus the heuristic value of our model as an introduction 
to the concept of partial correlation, if we first proceed within the 
framework of the restriction implicit in (1.1) or (1.4). With that 
end in view, we must now elaborate the prescription of the rules of the 
game to accommodate the contribution of more than one umpire to the 
player’s total. It will scarcely be necessary to show that results 
exhibited with respect to the dependence of concurrent variates on either 
or both of two common sources of variation are amenable to indefinite 
extension ; and we shall therefore exhibit the outlines of the theory of 
partial correlation only for the double-bonus situation. 

If two players (A and B) receive the same bonus equivalent to the 
total score of each of two umpires (U and W), we can ask what residual 
correlation would exist, if either umpire withheld his bonus; and it is 
then necessary to refine our conventions. We shall henceforth designate 
2 coefficients respectively as: (a) Ta».«w with respect to the players’ total 
scores in the absence of a bonus from an umpire W; (b) frav.u with 
respect to the players’ total scores in the absence of a bonus from an 
umpire U. In conformity with this symbolism: 


Vee is the variance of the distribution of the joint individual 
score of the umpire U and player A; 


Va.u is the variance of the distribution of the joint individual 
score of the umpire W and player A; 


Cov(ZeTu)w is the covariance of the score of umpire U and the joint 
individual scores of U and A; 


Cov(2atw)y is the covariance of the score of umpire W and the joint 
individual scores of W and A; 


4 
q 

| 

| | 

{ 

| 

i 


138 LANCELOT HOGBEN AND KENNETH W. KEMP 


Cov(2e2»)w is the covariance of the total scores of A and B in the 
absence of the contribution of umpire W; 


Cov(tet)y is the covariance of the total scores of A and B in the 
absence of the contribution of umpire U. 


Symbols for corresponding parameters of the B-score distribution in 
the absence of a contribution from one or other umpire are analogous 
to the above. Without the additional subscript Cov(2ar»), Tar, Va, Vo 
refer to total score distributions involving the double bonus. The 
symbols V,.o and V»., now refer to the individual score distributions 
when the players receive neither bonus. The following relations must 
exist between the variances in virtue of the fact that the individual 
player’s own score is independent of that of either umpire and the score 
of one umpire is independent of that of the other: 


Va=Vaot Vat Va.w = Vo.o+ Vu. (7.1) 
Va— Vw = Va.ot+ Vu = Va.w. (7. 2) 
Similarly : 
Va—Vu=Vau 3; Vo—Vux=Vo.u. (7.3) 
V, — Vu = (7. 4) 


By definition in accordance with the derivation of (4.1): 
Cov(%a, = Vu + Vee, 


and 
(Vu + Vo. (7.5) 

Similarly we may put: 
Vu/V Va.wVd. 0; (7.6) 
Tow = Va: Vo = V Viw/Va, (7. 7) 
VVu/Vo, (7. 8) 
Taw Tow = Va: Vo. (7. 9) 


Hence from (7.5): 


Tad — Taw Tow = (Vu Vw)/V Vi— Vu/V Va: Vy 
— Ve. (7.10) 


2) 
‘) 
3) 


CORRELATION MODELS 139 
Also from (7.7) and (7.8): 
1— = (Va—Vw)/Va 3 1—Tow* = Vw)/Vo. 
Hence from (7.2) and (7.4): 
V (1— (1 — Tow?) = (7-11) 
By combining (7.10) and (7%. 11) we get: 
(rox —Tavtow)/V (1 —Taw®) Ve. wV 


Hence in accord with (7.6) above, we arrive at the fundamental theorem 
of partial correlation in its simplest form with respect to the residual 
concurrent relationship of x, and 2, in the absence of any contribution 
of the second umpire (W) or what comes to the same thing, an invariable 
contribution therefrom: 


= (Tar Tawow)/V (1 — Taw") (1 Tow"). (7. 12) 
In the same way, we obtain: 
Tad.« = (Tan — TauTou) / V (1 — rau”) (1 — (7. 13) 


In the more general case when the players receive as bonus different 
multiples of the score of each of the two umpires, 1. e., 


La = + + 


Ly = + + tlw, 


it is easy to develop the relations corresponding to (7. 1)-(7%. 13), and in 
particular, to show that (7.13) holds when z, and 2, are defined by 
(7.14). Thus the fundamental theorem of partial correlation, which 
is easy to extend to any number of concurrent variates (2, 2», 2° * *) 
holds good within the entire domain of linear concomitant variation 
without invoking the postulate of linear regression at any stage in the 
argument. 
8. A PHYSICAL ANALOGY TO THE MODEL 

Within certain limits the law relating the load to the extension of 
a spring is very closely linear. Accordingly, it may help to clarify the 
relevance to practical affairs of logical relations examined against the 
background of the statistical model dealt with above, if we now formulate 
a comparable physical situation. We suppose that we have two springs 


On 


| 
| 
= 1 
| 
4 
in 
18 
1e 
ns 
st 
| 
re 
3) 
| 
| 


140 LANCELOT HOGBEN AND KENNETH W. KEMP 


A and B whose extension we measure by different verniers, severally 
liable to occasion errors of measurement subject to laws peculiar to 
themselves. If we make a scatter diagram of the extension of one spring 
with respect to that of the other for equal loads, our experiment conforms 
to the assessment of a concurrent relationship between A and B scores 
as defined in § 2, and an extension-load scatter diagram for either spring 
would embody the assessment of a consequent relationship as between 
A (or B) and U scores. 

If we write as ¢, the true value of the observed extension 2, of the 
spring A, the error of measurement as z_., and specify the load as zy, 
Hooke’s law implies that: 


= ta + = + 


If the two springs have the same elastic modulus and the same dimen- 
sions, we may write in conformity with (1.1): 


Contrariwise, the assertion that the springs have different coefficients 
of extension conforms to the more general law specified by (4.6) with 
the implication that | and m are unequal in the identities t, —1- zy 
and t)}—=m-z,. If we specify by x, and z, observed values of the 
extension of one and the same spring, respectively determined by verniers 
A and B with different error laws,® the restriction implicit in (1. 1) 
necessarily holds good. 

The foregoing is a physical situation of which we are entitled to 
postulate: (a) linear concomitant variation, insofar as Hooke’s law cor- 
rectly describes it; (b) a narrow range of discrete errors consistent with 
competent observation. If the latter condition holds good, i. e. we reject 
the postulate that our errors have a continuous distribution, conclusions 
established by the foregoing analysis therefore signify: 


(i) regression of the extension of either spring with respect to the 
load would be linear in all circumstances ; 


(ii) regression with respect to paired measurements of the extension 
of one and the same spring with different verniers would be linear in 


* Since common laboratory operations often ring the changes on not more than 
3 scale divisions consistent with competent workmanship, the postulate of a 
continuum in the realm of errors of measurement may be as irrelevant to this 
situation as to the issues dealt with in previous sections. 


‘ 

& 


CORRELATION MODELS 141 


both directions only if the error law of one or other vernier had a specific 
relationship to the distribution law of the loads themselves; a relation- 
ship which would rarely, if ever, occur in practice ; 


(iii) regression of the extension of two springs on one another will | 
not be linear, if they have different coefficients of extension. q 


9. SUMMARY 


1. This communication deals with the properties of a model of which 
we can predicate a strictly linear law of concomitant variation in 
any meaningful sense of the term. 


2. We can explore the numerical properties of the model by a simple 
procedure which is easy to visualise and to carry through without 1 
recourse to laborious computation. H 


3. The familiar relations of regression and partial correlation within :/ 
the framework of the prescribed rules of the game are derivable by r 
very elementary algebraic methods without recourse to geometrical 
concepts at any stage. 


, 4. The rules admit the possibility of exploring what algebraic properties 3 
, of the product-moment coefficient are pertinent to relationships ‘| 
; severally distinguished as consequence and concurrence. 


5. Though the law of concomitant variation implicit in the rules is q 
strictly linear, regression of the concurrent variates may be linear 
in both dimensions, in one only and in neither. A necessary con- 
dition for linear regression is easily definable. 


) 

1 

t 6. When the law of concomitant variation is linear the product- 
g moment formula retains the essential summarising properties of a 
correlation coefficient even if regression is non-linear in both 
dimensions. 


7. Since the correlation ratio is in fact equivalent to the square of 
the product-moment coefficient only if regression is linear in the 
appropriate dimension, the properties of the model bring sharply 
into focus the need for distinguishing between the concept of linear 
correlation and linear regression. 


° 8. The square of the product-moment coefficients defines the fraction 


of concomitant variance of the player’s score distribution as herein 


{ 
e 
n 
n 
n 4 
q 
q 
] 


142 LANCELOT HOGBEN AND KENNETH W. KEMP 


defined only in special circumstances. It never does so in the 
domain of concurrent, as opposed to consequent, relationship except 
in the trivial cases 0 or Accordingly, the indiscrimi- 
nate use of the square of the product-moment formula as a so-called 
coefficient of determination or explained variation when regression 
is linear is unjustifiable. 

9. The fundamental theorem of partial correlation is valid in all 
circumstances prescribed by a strictly linear law of concomitant 
variation regardless of conditions which prescribe linear regression 
of variates involved. 


10. One conclusion which emerges from an examination of the properties 
of this and of other models to be the subject of subsequent treatment 
is that the approach to correlation by geometrical methods suggested 
by the customary derivation of physical laws is too general to 
accommodate specific qualitative peculiarities of the manifold cir- 
cumstances in which correlations emerge in the domain of biology 
and sociology. 


It is a pleasure to acknowledge suggestions for clarifying the text of the 
above by the Editor, by Dr. Enid Charles and by Mr. C. A. B. Smith. 


